CKS namespaces
CKS has two types of namespaces:- User namespaces carry your Org ID label. You have full control over user namespaces. You can create, change, and delete them.
- Control Plane namespaces host critical services that run within the cluster. CoreWeave creates these namespaces. Do not alter or delete them. CoreWeave automates workloads within these managed namespaces and doesn’t bill customers for jobs that run in Control Plane namespaces.
ns.coreweave.cloud/org=control-plane to all Control Plane namespaces. To view these namespaces in a CKS cluster:
Node type selection labels
All CoreWeave GPU and CPU Nodes feature Instance IDs. To ensure consistency, every Node within a Node Pool carries an Instance ID through theinstance-type label. For example:
Some customers may use the
node.kubernetes.io/type label. CoreWeave updated this label to reference the new Instance ID.Node lifecycle labels
CKS uses labels to identify a Node’s state in the Node lifecycle. These labels ensure that CKS always schedules customer workloads on healthy production Nodes.node.coreweave.cloud/state: Identifies the Node’s lifecycle state, such asproduction,zap, ortriage.node.coreweave.cloud/reserved: Identifies the workload type running on the Node:- If
/reservedis a customer Org ID and/state=production, it’s for user workloads. - If
/stateis notproduction, then/reservedmatches/state.
- If
node.coreweave.cloud/reserved-desired: Overrides/reserved. If it doesn’t match/reserved, the marked Node is pending and transitions Reservations automatically.
User-provided labels
Customers may create custom Node labels for scheduling or organization, but never in the*.coreweave.cloud or *.coreweave.com namespaces. CKS rejects attempts to do so.
Pod interruption and eviction policies
CKS supports three eviction strategies for Pods: non-interruptible, interruptible, and gracefully interruptible. These strategies determine how CKS handles Pods during Node maintenance, reboots, or scale-down events.Summary of eviction strategies
| Strategy | Pod label | Description | Note |
|---|---|---|---|
| Non-interruptible | None (default) | CKS doesn’t proceed with maintenance or scale-down while these Pods are running. Use for critical training jobs and single-instance stateful apps. | Default behavior for Nodes, ensuring stability and reliability. |
| Interruptible | qos.coreweave.cloud/interruptable | CKS terminates Pods and proceeds with the Node action without waiting for their full terminationGracePeriodSeconds. Use for stateless workloads that you can restart without data loss. | Misspelled as interruptable for historical reasons. In the qos.coreweave.cloud namespace. |
| Gracefully interruptible | qos.coreweave.com/graceful-interruptible | CKS blocks the Node action, including reboots, until the Pod terminates or its terminationGracePeriodSeconds expires. Use only for stateful applications that block rebooting until the application terminates, like databases and local storage. For all other workloads, use interruptible. | Spelled correctly. In the qos.coreweave.com namespace. |
Non-interruptible
Workloads that you don’t want interrupted, such as critical training jobs and single-instance stateful apps, shouldn’t set either of the two labels in the preceding table. If neither label is set, the Pod is considered non-interruptible. CKS treats Nodes with non-interruptible Pods as active. This means CKS doesn’t proceed with Node maintenance, reboots, or scale-down while these Pods are running. CKS never evicts non-interruptible Pods unless an extreme event occurs, such as complete Node failure or DC power loss. This is the default behavior for Pods.Interruptible
Workloads such as inference Pods or stateless applications that you can restart without data loss should use the interruptible strategy. CKS terminates these Pods and proceeds with the Node action (such as reboot or scale-down) without waiting for the Pod’s fullterminationGracePeriodSeconds.
To choose this strategy, apply the label qos.coreweave.cloud/interruptable: "true" to your Pods. This label is in the qos.coreweave.cloud namespace.
interruptable.
Gracefully interruptible
Use the gracefully interruptible strategy only for stateful applications that block a reboot until the application terminates, like databases and local storage. These workloads need to finish writing data or hand off state before a Node reboots or drains. For all other applications, including stateless workloads that can be restarted without data loss, use the interruptible strategy instead. Unlike interruptible Pods, which CKS deletes immediately without waiting, gracefully interruptible Pods block the Node action (including force reboots) until the Pod terminates or itsterminationGracePeriodSeconds expires. CKS sends a DELETE call to the Pod and waits for the full grace period before proceeding.
To choose this, apply the label qos.coreweave.com/graceful-interruptible: "true" to your Pods. This label is in the qos.coreweave.com namespace.
Key behaviors and limitations
The following sections describe behaviors and limitations to consider when usinggraceful-interruptible Pods.
NodePool scale-down
CKS skips Nodes hostinggraceful-interruptible Pods when it determines whether the Node is a candidate for removal. This means scale-down may stall if every Pod on candidate Nodes carries this label.
Tolerations that prevent graceful eviction
Pods that tolerate eithernode.coreweave.cloud/evict=true:NoExecute or node.coreweave.cloud/reserved:NoExecute don’t go through the graceful-interruptible logic and may be evicted immediately. See Eviction taints.
Drain time differences
- Reboots and maintenance use a default drain timeout of 3 minutes, and honor
terminationGracePeriodSecondsforgraceful-interruptiblePods. - CKS-initiated scale-down doesn’t wait for DaemonSets unless they carry
qos.coreweave.com/graceful-interruptible: "true"and don’t tolerate eviction taints. - For services that require long termination phases, explicitly set
terminationGracePeriodSecondsaccordingly.
Potential for scale-down stalls
By design, CKS never removes a Node that contains onlygraceful-interruptible Pods. If every Pod on a candidate Node carries this label, CKS has nowhere to reclaim capacity and stalls while it waits for Nodes that can be safely drained. In practice, this can block automated scale-down workflows.
Risk of stuck Nodes
If you deploy workloads without accounting forgraceful-interruptible semantics, Nodes can remain in a quasi-drained state indefinitely. For example, you may cordon a Node for maintenance, then find it never transitions to “Ready” again because every Pod refuses immediate eviction. Left unchecked, these Nodes consume capacity and can complicate rolling updates.
To mitigate these risks, follow these recommendations:
- Plan deployment strategies to ensure some
interruptablePods exist as safe eviction candidates for CKS. - Monitor NodePool capacity and scheduling health. Set up alerts on stalled scale-down events or sustained high utilization to detect when
graceful-interruptiblePods hold Nodes. - Establish maintenance procedures that include manual intervention steps (for example, draining and deleting problematic Nodes) as a fallback when automated processes can’t reclaim resources.
graceful-interruptible without compromising cluster resilience or cost efficiency.
Taints and tolerations
CKS uses taints to guard control-plane Nodes and enforce GPU and CPU scheduling. The following sections describe the eviction taints CKS applies to Nodes and the user-facing taints that route Pods to the correct hardware.Eviction taints
CKS applies the following eviction taints to Nodes:node.coreweave.cloud/evict=true:NoExecutenode.coreweave.cloud/reserved:NoExecute
User taints
Pods without GPU requests automatically tolerate the CPU taint (is_cpu_compute:NoSchedule).
CPU taint
is_gpu=true:PreferNoSchedule) prevents CPU-only Pods from scheduling on GPU Nodes unless necessary. A CPU-only Pod can still schedule on a GPU Node if no CPU Nodes are available.
GPU taint
Use caution before adding tolerations to your Pods, so workloads continue to run on healthy Nodes.
SUNK-specific scheduling
The following sections describe scheduling behaviors specific to SUNK workloads on CKS.The SUNK /lock taint
To prevent contention with other Pods that request GPU access while long-running slurmd Pods are active, SUNK adds a new GPU resource to Kubernetes, sunk.coreweave.com/accelerator, in addition to the nvidia.com/gpu resource provided by NVIDIA’s plugin.
Because the GPU has two different resource names, Kubernetes tracks the consumption separately, which lets Slurm Pods request the same underlying GPU as other Kubernetes Pods. However, this requires SUNK to manage GPU contention instead of the Kubernetes scheduler.
SUNK manages the contention with a taint called sunk.coreweave.com/lock. SUNK applies this taint to Nodes through a call to slurm-syncer during the Prolog phase.
SUNK's lock taint
DaemonSets on SUNK Nodes
Kubernetes DaemonSets that run on SUNK Nodes must tolerate thesunk.coreweave.com/lock taint, as well as is_cpu_compute, is_gpu, and node.coreweave.cloud/reserved:
Example toleration