Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

This page covers how to identify Pods that trigger static CPU allocation, why the issue causes SUNK Nodes to drain, and how to configure Pod resources to prevent it. Kubernetes can pin specific CPU cores to a Pod so that no other process on the Node shares those cores. This feature is called static CPU allocation, and the CPU Manager static policy in the kubelet controls it. CoreWeave CKS Nodes enable this policy by default. Static CPU allocation is useful for latency-sensitive workloads that benefit from dedicated cores. However, it’s incompatible with the SUNK Pod Scheduler because Slurm can’t account for CPU cores that the kubelet has pinned to other Pods.
Do not schedule Pods with CPU Guaranteed QoS on Kubernetes Nodes used for SUNK. These Pods trigger static CPU allocation, which causes resource contention in Slurm, causing the Slurm Nodes to drain with misleading error messages, for example, batch job complete failure

Static CPU allocation

A Pod triggers static CPU allocation when both of these conditions are true:
  • The Pod has Guaranteed QoS class, meaning every container, including init containers, sets CPU and memory requests equal to its limits.
  • The CPU request is a whole integer (for example, cpu: 4), not a fractional value (for example, cpu: 3.5 or cpu: 750m).
For example, this resource specification triggers static CPU allocation because requests and limits are equal and the CPU value is an integer:
resources:
  requests:
    cpu: "4"
    memory: 8Gi
  limits:
    cpu: "4"
    memory: 10Gi

CPU allocation with SUNK

Slurm is configured with a fixed CPU count for each Node and is unaware of CPUs that the kubelet has pinned to Guaranteed QoS Pods. When static CPU allocation removes cores from the shared pool, the CPUs available to Slurm shrink to the configured total minus whatever the kubelet has pinned. Slurm continues to schedule jobs against the original count, which leads to resource contention. The following sequence shows how the issue occurs:
  1. A Pod with Guaranteed QoS and integer CPU requests is scheduled onto a SUNK Node.
  2. The CPU Manager in the kubelet pins specific CPU cores to the Pod using cpuset cgroups.
  3. Slurm schedules job to node with pinned resources.
  4. Job has resource contention issues. Job fails.
  5. Slurm drains the Node due to the type of job failure.

Common drain reasons

When static CPU allocation causes a job to fail from resource contention, Slurm drains the Node with the reason batch job complete failure. This message can be misleading because it doesn’t point to the underlying CPU pinning by the kubelet.

Prevent CPU allocation draining Nodes

To prevent static CPU allocation from triggering Node drains, ensure that Pods on SUNK Nodes use Burstable QoS instead of Guaranteed QoS, which prevents the kubelet from pinning CPU cores.

Switch Pods to Burstable QoS

Set CPU requests lower than CPU limits to change the Pod’s QoS class from Guaranteed to Burstable:
resources:
  requests:
    cpu: "3"
    memory: 8Gi
  limits:
    cpu: "4"
    memory: 8Gi
Omitting the CPU limits field altogether also produces Burstable QoS.

Verify a Pod’s QoS class

To check whether a Pod has Guaranteed or Burstable QoS, query its status. Replace [POD-NAME] and [NAMESPACE] with your Pod’s name and namespace:
kubectl get pod [POD-NAME] -n [NAMESPACE] -o jsonpath='{.status.qosClass}'
The output should be Burstable, not Guaranteed. To check all Pods on a specific Node, replace [NODE-NAME] with the Node’s name:
kubectl get pods -A --field-selector spec.nodeName=[NODE-NAME] -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.qosClass}{"\n"}{end}'
Look for any Pods with Guaranteed QoS class. If found, update their resource specifications as described in How to prevent this issue.

How to recover drained Nodes

If Nodes have already been drained by this issue, update the Pod resource specifications so that CPU requests don’t equal CPU limits. The Pod reschedules with Burstable QoS, and the Node should recover. For instructions on undraining Nodes, see Drain and undrain Slurm Nodes. To identify Nodes drained by this issue, look for the drain reason listed in Common drain reasons:
sinfo -t drain -NO "NodeList:45,Reason:130" | grep "batch job complete failure"
Last modified on May 1, 2026