requests to roughly the per-GPU share of the instance, leaving headroom for DaemonSets and system overhead. For example, on an 8-GPU H100 Node (128 vCPUs, 2 TB of RAM), a per-GPU split of about 15 vCPUs and 225 GB derives from the official whole-Node example (cpu: 120, memory: 1800Gi) in Target specific GPUs or CPUs. Set limits equal to requests for Guaranteed-QoS Pods, or higher than requests if your workload occasionally bursts. Specify GPUs under limits as nvidia.com/gpu: 1 (or more). Kubernetes mirrors the value to requests automatically.
For background on how requests and limits behave on GPU Pods, see How do CPU and memory requests work with GPU Pods?.
Workload Scheduling