Feature Store Explained: Feast on Kubernetes
You've built the dataset pipeline, cleaned the data, and engineered your features into a tidy CSV. Now the model goes to production — and within weeks you discover that inference predictions are silently wrong. Not because the model is bad, but because the feature values at serving time don't match what the model saw during training.
This is the feature consistency problem, and it's one of the most common sources of production ML failures. A feature store solves it by acting as a single, versioned source of truth for features — shared between training and inference, so both pipelines always consume features the same way.
We'll use Feast, an open-source, Kubernetes-native feature store used in production by companies like Nvidia, Shopify, and Expedia.
The Problem Feature Stores Solve
In a naive MLOps setup, feature engineering happens twice:
- Training time: a Python script reads the raw CSV, transforms it, and writes a featured dataset
- Inference time: an API handler reads raw employee data, applies "the same" transformations, and feeds the model
The word "same" is doing a lot of work there. In practice, two separate implementations of the same logic drift apart. A column gets renamed. A normalization constant changes. A derived feature gets computed in a different order. The result is training-serving skew: the model predicts on different data than it was trained on, and it doesn't throw an error — it just silently degrades.
A feature store enforces a single definition of every feature. That definition is used both when materializing data for training and when serving features at inference time. Skew becomes structurally impossible.
Why Feature Order Is Critical
Before diving into the feature store architecture, you need to understand a subtle but important constraint: the model does not understand column names.
During training, your model sees a matrix of numbers. It learns
patterns based on the position of each value:
"if index 0 is high and index 2 is 1, predict attrition." The
column header tenure is never stored in the model.
Only the position matters.
| featured.csv column order | What the model actually sees |
|---|---|
tenure → 0.45 |
index 0 → 0.45 |
salary → 0.78 |
index 1 → 0.78 |
overtime → 1 |
index 2 → 1 |
If at inference time you pass salary as the first
column, the model treats it as tenure. It will still
run. It will still return a probability. But that probability is
based on completely wrong inputs, with no warning.
What Is a Feature Store
A feature store is an infrastructure layer with four responsibilities:
| Responsibility | What It Means |
|---|---|
| Define | Store feature definitions (schema, entities, metadata) in a central registry |
| Store | Persist feature data in both offline (historical) and online (real-time) stores |
| Serve | Retrieve features at low latency during inference, in the correct order |
| Share | Expose the same features across teams — model A and model B can reuse the same employee_tenure feature without recomputing it |
In the employee attrition project, a feature is any derived signal:
tenure, overtime,
promotion_stagnation, career_velocity.
These values are computed once, stored in the feature store, and
served consistently to every model that needs them.
Offline vs Online Store
Training and inference have very different data access patterns, so a feature store uses two separate backends.
Offline Store — For Training
Training requires large volumes of historical feature data — lakhs of employee records spanning years. Speed is secondary; completeness and reproducibility matter. The offline store is backed by slow but scalable storage: S3, Parquet files, BigQuery, or Redshift.
When you kick off a training job, the Feast SDK reads the feature definitions from the registry, queries the offline store directly, and returns a time-travel-correct dataset: for each training example, it retrieves the feature values that existed at the exact timestamp of that label. This prevents future leakage — a subtle bug where you accidentally use feature values that weren't available at prediction time.
Online Store — For Inference
During inference, the model API receives an employee ID and needs to return a prediction in milliseconds. The offline store is far too slow for this — S3 latency is 100–200 ms per read. The online store uses Redis or DynamoDB, which serve feature lookups in 1–5 ms.
| Store | Backend | Typical Latency | Used For |
|---|---|---|---|
| Offline | S3 / Parquet / BigQuery | 100–500 ms | Training, batch scoring |
| Online | Redis / DynamoDB | 1–5 ms | Real-time inference |
The online store only holds the latest feature values for
active entities (e.g., current employees). It is not a historical
archive — it's a cache of the most recent precomputed features,
optimized for fast point lookups by entity key (e.g.,
employee_id).
Materialization: How Data Gets Into Redis
You now have two stores. The offline store holds years of historical features in S3. The online store in Redis holds the latest values for current employees. But how does data move from S3 into Redis?
That process is called materialization. It reads the latest relevant feature values from the offline store and loads them into Redis. As a DevOps engineer, this is typically yours to own — materialization runs as a scheduled Kubernetes CronJob or Airflow DAG.
# Materialize features updated in the last 7 days into Redis
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")
# Kubernetes CronJob: nightly materialization
apiVersion: batch/v1
kind: CronJob
metadata:
name: feast-materialize
namespace: mlops
spec:
schedule: "0 2 * * *" # 02:00 UTC daily
jobTemplate:
spec:
template:
spec:
containers:
- name: materialize
image: ops4life/feast-worker:latest
command:
- bash
- -c
- feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")
env:
- name: FEAST_REPO_PATH
value: /feast
restartPolicy: OnFailure
Materialization is incremental: it only processes feature records that have changed since the last run. In the employee attrition project, only active employee records are materialized — the filtering is handled in the Feast feature view definition before the CronJob runs.
Feature Registry
The feature registry is the central catalog of all feature definitions: their schema, the entities they're keyed on, and where the underlying data lives. In Feast, the registry is backed by PostgreSQL.
Instead of manually constructing the input vector for every inference call, the inference API asks the registry: "give me the feature set defined for the attrition model, for employee 101." The SDK returns features in the exact order the model expects, fetched from Redis.
from feast import FeatureStore
store = FeatureStore(repo_path="/feast")
# Fetch online features by entity key — order is guaranteed by the registry
features = store.get_online_features(
features=feature_service, # feature set defined at training time
entity_rows=[{"employee_id": 101}]
).to_dict()
# Pass directly to model — no manual column sorting needed
prediction = model.predict([list(features.values())])
The training job uses the same registry to read feature definitions and construct the historical training dataset from S3. Because both paths read the same schema, the column order is guaranteed to match.
What the Registry Stores
| Object | What It Defines | Example |
|---|---|---|
| Entity | The primary key that identifies a record | employee_id |
| Feature View | A group of related features sourced from one dataset | employee_features |
| Feature Service | A named set of feature views consumed by a specific model | attrition_model_v1 |
| Data Source | Where raw feature data lives (S3 path, table name) | s3://features/employee_attrition.parquet |
Feast Architecture on Kubernetes
On Kubernetes, Feast runs as a set of deployments. Here's the complete picture of how the components interact:
| Component | Implementation | Role |
|---|---|---|
| Feature Registry | PostgreSQL | Stores feature definitions, schema, and metadata |
| Online Store | Redis | Serves latest feature values at low latency (<5 ms) |
| Offline Store | S3 / Parquet | Historical features for training and batch scoring |
| Feature Server | Feast Feature Server pod | HTTP/gRPC API that serves online features to inference pods |
| Materialization | Kubernetes CronJob | Moves updated features from S3 → Redis on schedule |
Training Data Flow
- Training job starts with a list of entity keys and timestamps
- Feast SDK reads feature view definitions from PostgreSQL registry
- SDK performs a point-in-time join against the offline store (S3)
- Returns a training dataset with time-correct features — no future leakage
Inference Data Flow
- HR user submits an employee ID to the inference API
- Inference API calls the Feast Feature Server:
GET /get-online-features - Feature Server reads definitions from PostgreSQL, fetches values from Redis
- Returns feature vector in the canonical order from the registry
- Inference API passes the vector to the model, returns prediction
redis_memory_used_bytes and eviction rate — evictions
mean Redis is running out of memory and dropping feature data,
which breaks inference.
Deploying the Stack
# Deploy PostgreSQL for the feature registry
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install feast-postgres bitnami/postgresql \
--namespace mlops --create-namespace \
--set auth.database=feast \
--set auth.username=feast \
--set auth.password=feastpassword
# Deploy Redis for the online store
helm install feast-redis bitnami/redis \
--namespace mlops \
--set auth.enabled=false \
--set architecture=standalone
# feast/feature_store.yaml — Feast configuration
project: employee_attrition
provider: local
registry:
registry_type: sql
path: postgresql://feast:feastpassword@feast-postgres:5432/feast
online_store:
type: redis
connection_string: "feast-redis-master:6379"
offline_store:
type: file # or use spark/bigquery/redshift for production scale
entity_key_serialization_version: 2
# feast/features.py — Feature definitions registered in PostgreSQL
from datetime import timedelta
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float32, Int64
employee = Entity(name="employee_id", description="Employee identifier")
employee_source = FileSource(
path="s3://mlops-features/employee_attrition.parquet",
timestamp_field="event_timestamp",
)
employee_features = FeatureView(
name="employee_features",
entities=[employee],
ttl=timedelta(days=7),
schema=[
Field(name="tenure", dtype=Float32),
Field(name="salary_band", dtype=Int64),
Field(name="overtime", dtype=Int64),
Field(name="promotion_stagnation",dtype=Float32),
Field(name="career_velocity", dtype=Float32),
Field(name="overall_satisfaction",dtype=Float32),
],
source=employee_source,
)
# Apply feature definitions to the registry
cd feast && feast apply
# Run initial materialization (loads Redis from S3)
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")
# Verify: retrieve features for employee 101
feast get-online-features \
--features employee_features:tenure,employee_features:salary_band \
--entity-rows '[{"employee_id": 101}]'
Smoke Testing the Feature Server
# Port-forward the Feast Feature Server
kubectl -n mlops port-forward svc/feast-feature-server 6566:6566 &
# Request features over HTTP
curl -s http://localhost:6566/get-online-features \
-H "Content-Type: application/json" \
-d '{
"feature_service": "attrition_model_v1",
"entities": {"employee_id": [101, 202, 303]}
}' | jq .
# Check p99 latency from Prometheus
kubectl -n mlops exec -it deploy/feast-feature-server -- \
curl -s localhost:8080/metrics | grep feast_feature_server_latency
Your Role as a DevOps Engineer
ML engineers define features. DevOps engineers keep the feature serving infrastructure running reliably in production.
| Responsibility | What You Own |
|---|---|
| Deploy & manage | Feast Feature Server, Redis, and PostgreSQL on Kubernetes. Helm charts, resource limits, PodDisruptionBudgets. |
| Availability & scaling | HPA on the Feature Server based on request rate. Redis in HA mode (Sentinel or Cluster) for production. PostgreSQL with read replicas. |
| Materialization pipeline | CronJob or Airflow DAG that runs feast materialize-incremental on schedule. Alert on failures — a failed materialization means stale features in Redis. |
| CI/CD integration | Run feast apply in CI after feature definition changes. Schema validation before merge — a breaking schema change breaks inference. |
| Observability | Monitor Redis memory, eviction rate, and Feature Server p99 latency. Feast exposes Prometheus metrics out of the box. |
Key Metrics to Watch
# Redis: memory usage and evictions
redis-cli info memory | grep used_memory_human
redis-cli info stats | grep evicted_keys
# Feast Feature Server: latency histogram (Prometheus)
feast_feature_server_request_duration_seconds_bucket
# Materialization: check last run timestamp
feast feature-views list # TTL tells you how fresh features are
kube_cronjob_status_last_schedule_time) and on
Redis key TTL expiry to catch this before it silently degrades
model accuracy.
CI/CD: Safe Feature Deployments
Feature definitions are code. They must go through review and validation before they hit production. A schema change that removes a column will break any inference pipeline that references that column.
# .github/workflows/feast-deploy.yml
name: Deploy Feature Definitions
on:
push:
paths: ['feast/**']
jobs:
validate-and-apply:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Feast
run: pip install feast[postgres,redis]
- name: Validate feature definitions
run: |
cd feast
python -c "from features import employee_features; print('schema OK')"
- name: Apply to registry (production)
if: github.ref == 'refs/heads/main'
run: |
cd feast
feast apply
env:
FEAST_REGISTRY_DB_URL: ${{ secrets.FEAST_REGISTRY_DB_URL }}
With the feature store in place, the full data flow for production inference looks like this: HR records flow through the ETL pipeline into S3, nightly materialization pushes current employee features into Redis, and every inference call fetches a consistent, registry-governed feature vector — the same schema the model was trained on.