How Kubernetes Actually Runs GPU Workloads
Running GPU workloads on Kubernetes involves more than adding nvidia.com/gpu: 1 to your resource spec. You need to understand the full stack: how the NVIDIA kernel driver is installed on nodes, how the device plugin exposes GPUs to the scheduler, how containers get access to CUDA libraries, and how you monitor GPU utilization in production.
How Kubernetes Sees GPUs
Kubernetes treats GPUs as "extended resources" — custom resource types that can be requested by pods. The GPU count is tracked in the node's allocatable resources. Unlike CPU and memory, GPUs are not overcommittable: if a node has 4 GPUs and a pod requests 4, no other pod can get a GPU on that node until the first pod releases them.
NVIDIA Device Plugin
The NVIDIA device plugin is a DaemonSet that runs on every GPU node. It's responsible for three things: discovering GPUs on the node, reporting them to the kubelet as allocatable resources, and mounting the GPU device files into containers that request them.
Requesting GPUs in Pod Specs
NVIDIA GPU Operator
The GPU Operator automates the entire GPU software stack installation and management on Kubernetes nodes. Instead of manually installing NVIDIA drivers, CUDA, and container runtime on each node, you install the GPU Operator once and it handles everything.
The GPU Operator installs a "driver container" on each node that loads the NVIDIA kernel driver without needing to modify the host OS — critical for immutable OS distributions like CoreOS/Flatcar.
MIG: Multi-Instance GPU Partitioning
NVIDIA A100 and H100 GPUs support MIG (Multi-Instance GPU), which partitions a single physical GPU into multiple isolated "GPU instances" that can be independently assigned to pods. This is useful for inference workloads that don't need a full GPU.
| A100 80GB MIG Profile | Memory | Compute | Max Per GPU | Use Case |
|---|---|---|---|---|
1g.10gb | 10 GB | 1/7 | 7 | Small inference |
2g.20gb | 20 GB | 2/7 | 3 | Medium inference |
3g.40gb | 40 GB | 3/7 | 2 | Large inference / fine-tuning |
7g.80gb | 80 GB | 7/7 | 1 | Full GPU training |
DCGM Monitoring
NVIDIA DCGM (Data Center GPU Manager) provides deep GPU health and performance metrics. The GPU Operator deploys DCGM Exporter as a DaemonSet, which exposes metrics in Prometheus format.