ML Basics For DevOps Engineers
You've spent years automating infrastructure, writing pipelines, and making software delivery reliable. Now your team is talking about models, training runs, and feature stores. This guide bridges that gap — not by dumbing things down, but by mapping ML concepts onto things you already understand.
Why ML Is Different From Regular Software
In traditional software development, the behavior of a system is fully determined by the code you write. You can read the code and understand exactly what will happen for any input. In machine learning, the behavior is determined by data + algorithm + training process. The "code" that determines behavior is a trained model artifact — a file of numbers.
This changes everything about how you operate it:
- Bugs can be invisible. A model can silently degrade without throwing errors. It just starts returning worse predictions.
- Reproducibility is hard. Two identical training runs with different random seeds or data ordering can produce different models.
- The artifact is huge. A model might be 500MB or 50GB. You can't store it in git like source code.
- Deployment isn't enough. You must also monitor for data drift and model decay over time.
Core ML Concepts You Need to Know
Supervised Learning
The most common type of ML. You have labeled examples (input → known output), and you train a model to predict the output for new inputs. Examples: predicting whether a deployment will fail (classification), forecasting server load (regression).
Features and Labels
A feature is an input variable — one piece of information about a data point. A label is the thing you're trying to predict. If you're predicting customer churn, features might be "days since last login", "number of support tickets", and "contract type". The label is "churned: yes/no".
Training, Validation, and Test Sets
You split your data into three buckets:
- Training set (~70%): The model learns from this data.
- Validation set (~15%): Used during training to tune hyperparameters and detect overfitting.
- Test set (~15%): Held out completely, used only for final evaluation. Never touch this during training.
Overfitting vs Underfitting
Overfitting means the model memorized the training data but can't generalize to new data. Your training accuracy is great but validation accuracy is bad. Underfitting means the model is too simple to capture the patterns in the data — both training and validation accuracy are bad.
Hyperparameters vs Parameters
Parameters are what the model learns (the weights). Hyperparameters are the knobs you set before training: learning rate, number of trees, layer sizes. Tuning hyperparameters is a major part of the MLOps workflow.
The Python ML Ecosystem
Python is the dominant language in ML. Here's what you need to know about the key libraries:
| Library | Purpose | DevOps Analogy |
|---|---|---|
numpy | N-dimensional array math | The bash of ML — always present, everything uses it |
pandas | Tabular data manipulation | Like jq but for tables, plus SQL-like operations |
scikit-learn | Classical ML algorithms | The standard library of ML |
PyTorch | Deep learning framework | The "engine" — low level, highly flexible |
MLflow | Experiment tracking & model registry | Like git + Artifactory for models |
DVC | Data version control | Git LFS but designed for ML datasets |
Training Your First Model
Let's train a real model. We'll use the classic iris dataset to classify flower species — a "hello world" for ML. Don't worry about what iris flowers are; focus on the pattern of how training works.
Run this and you'll see output like Accuracy: 0.9667. The model is stored in MLflow's artifact store. You can view the experiment in the MLflow UI by running mlflow ui and visiting http://localhost:5000.
mlflow.start_run() as a git commit. It captures a snapshot: the hyperparameters (like your commit message), the metrics (like test results), and the model artifact (like the compiled binary). You can always go back and reproduce any run.
ML Workflow vs CI/CD: A Mapping
Your existing intuitions about software delivery map surprisingly well to MLOps:
| CI/CD Concept | ML Equivalent | Notes |
|---|---|---|
| Source code | Training code + dataset | Both must be versioned together |
| Build artifact | Trained model file | Stored in model registry, not git |
| Unit tests | Data validation tests | Check for nulls, schema drift, range violations |
| Integration tests | Model evaluation metrics | Accuracy, F1, AUC must meet thresholds |
| Canary deploy | A/B model testing | Route 5% of traffic to new model |
| Rollback | Champion/challenger swap | Revert to previous model version in registry |
| Monitoring | Prediction monitoring + drift | Watch for statistical drift, not just errors |
The fundamental difference: in CI/CD you ship code. In MLOps you ship code AND data AND the artifact they produce together. A change to the training data can change the model behavior just as much as a code change.
You're now ready to understand the full MLOps lifecycle. In the next guide, we'll go deeper into exactly what skills you need to bring from DevOps and what you'll need to learn fresh.