🚀 Free & Open · No Login Required

Learn MLOps.
Ship Models to Production.

Practical, hands-on guides for DevOps engineers stepping into the world of machine learning operations. From ML basics to GPU workloads on Kubernetes.

Start with ML Basics → Browse All Guides

13Guides

4Categories

~6hReading

FreeForever

Browse by Category

From ML fundamentals to production Kubernetes — structured learning paths for every stage.

Foundation

ML concepts, terminology, and the mental model shift from DevOps to MLOps.

MLOps Pipeline

Dataset pipelines, data prep, model training, and production serving with KServe.

Kubernetes for ML

K8s native features, GPU scheduling, and running distributed training workloads.

Operations & Monitoring

Data drift detection, model decay, dataset versioning, and automated retraining.

Featured Guides

The best places to start your MLOps journey.

ML Basics For DevOps Engineers

Understand the ML workflow, core terminology, and the Python ecosystem — then train your first model.

Beginner · 30 min

MLOps Step 1: Building a Dataset Pipeline

Ingestion patterns, data validation, S3 structure, Airflow DAGs, and DVC versioning from scratch.

Intermediate · 45 min

Kubernetes for ML

How Kubernetes Actually Runs GPU Workloads

NVIDIA device plugin, GPU Operator, CUDA runtime, MIG partitioning, and DCGM monitoring.

Advanced · 50 min

Recommended Learning Path

Follow this sequence to go from zero to production MLOps in ~6 hours.

1

Get Your Bearings (Foundation)

2

Build the MLOps Pipeline

3

Scale on Kubernetes

4

Operate in Production

All Guides

13 practical guides covering the complete MLOps lifecycle.

ML Basics For DevOps Engineers

ML workflow vs CI/CD, Python ecosystem, and training your first model.

Beginner · 30 min

DevOps to MLOps: What You Actually Need to Learn

Mental model shift, maturity model, tool mapping, and 90-day plan.

Beginner · 35 min

MLOps Step 1: Building a Dataset Pipeline

Ingestion, validation, S3 structure, Airflow DAGs, Argo Workflows.

Intermediate · 45 min

MLOps Step 2: Data Preparation

Cleaning, feature engineering, train/val/test splits, Feast Feature Store.

Intermediate · 40 min

Feature Store Explained: Feast on Kubernetes

Offline vs online stores, materialization, feature registry, and deploying Feast with Redis and PostgreSQL on Kubernetes.

Intermediate · 35 min

ML Model Training

Training loop, MLflow tracking, hyperparameter tuning, K8s Jobs.

Intermediate · 45 min

MLOps Step 4: Deploying with KServe

InferenceService YAML, canary deployments, autoscaling, monitoring.

Advanced · 50 min

Kubernetes for ML

Kubernetes Built-in Features for AI/ML

Node affinity, taints/tolerations, PV, Jobs, HPA, resource quotas.

Intermediate · 40 min

Kubernetes for ML

How Kubernetes Actually Runs GPU Workloads

NVIDIA device plugin, GPU Operator, CUDA, MIG, DCGM monitoring.

Advanced · 50 min

Data Drift, Model Decay, and Dataset Versioning

Drift detection, Evidently AI, DVC, MLflow aliases, automated retraining.

Intermediate · 45 min

Versioning Data With DVC

DVC pointer files, S3 remote storage, and step-by-step dataset versioning so every model links to the exact data it was trained on.

Intermediate · 30 min

Airflow + DVC Pipeline on Kubernetes

Automate dataset versioning with Airflow on Kubernetes: DAGs, KubernetesExecutor, Pod Identity, and DVC push to S3.

Advanced · 45 min

ML Docker Image Optimization: From 3 GB to Under 400 MB

Multi-stage builds, base image selection, dependency pruning, and a real Kubeflow case study with 89% size reduction.

Intermediate · 35 min