Workload Scheduling on CKS
Control where Pods are deployed through namespaces, labels, and taints
In CoreWeave Kubernetes Service (CKS), Nodes are organized using namespaces, labels, and taints. These features control where CoreWeave's core managed services are scheduled and ensure that customer workloads always run on healthy production Nodes.
CKS namespaces
CKS has two types of namespaces:
- User namespaces are created by customers and labeled with the customer's Org ID.
- Customers have full control over user namespaces - they can create, change, and delete them.
- Control plane namespaces are created by CoreWeave to host critical services that run within the cluster.
- Customers should not alter or delete these namespaces. CoreWeave workloads are automated within these managed via automation in these namespaces. Jobs that run in control plane namespaces are not billed to the customer.
CKS applies the label ns.coreweave.cloud/org: control-plane
to all control plane namespaces. To view these namespaces in a CKS cluster, select them using kubectl
with the --selector
option:
$kubectl get namespaces --selector=ns.coreweave.cloud/org=control-plane
Output:
NAME STATUS AGEhpc-verification Active 19dkube-system Active 22dnode-problem-detector Active 19d
Node type selection labels
All CoreWeave Nodes feature Instance IDs. To ensure consistency, all Nodes within a Node Pool are tagged with an Instance ID using the instance-type
label. For example, given a Node Pool comprised of Node type example
, all Nodes in that Node Pool feature the label node.kubernetes.io/instance-type=instance-type-example
. For a list of all Instance IDs, see: Instances.
Some customers may be using the node.kubernetes.io/type
label. This label has also been updated to reference the new Node Instance ID. For example, node.kubernetes.io/type=H100_NVLINK_80GB.8.xeon.128
will now return the value node.kubernetes.io/type=gd-8xh100ib-i128
.
Node lifecycle labels
CKS uses labels to identify a Node's state in the Node lifecycle. These labels ensure that customer workloads are always scheduled on healthy production Nodes, and that CoreWeave's critical core services are always scheduled on control plane Nodes.
node.coreweave.cloud/state
: Identifies the Node's state in the lifecycle, such asproduction
,zap
, ortriage
.node.coreweave.cloud/reserved
: Identifies the type of workloads that can run on the Node:- If the Node is reserved for user workloads and
/state
isproduction
, the value is the customer's Org ID. - If the Node is reserved for control plane workloads and
/state
isproduction
, the value iscontrol-plane
. - If
/state
is notproduction
, the value is the same as/state
.
- If the Node is reserved for user workloads and
node.coreweave.cloud/reserved-desired
: Overrides a Node'sreserved
value. If/reserved-desired
does not match the/reserved
label, the Node is pending. This procedure moves Nodes from the organization's reservation to thecontrol-plane
reservation to prevent workloads from being scheduled on pending Nodes.
User-provided labels
Customers can create custom Node labels for their own purposes, such as scheduling or organization.
However, new labels cannot be created in the *.coreweave.cloud
or *.coreweave.com
namespaces. Attempts to create labels in these namespaces are rejected by CKS. For example, the following labels are not allowed:
metadata:labels:foo.coreweave.cloud/bar: "true"foo.coreweave.com/bar: "true"Namespaces `*.coreweave.cloud` or `*.coreweave.com` are not allowed.
Pod interruption
CoreWeave supports two strategies for handling Pod interruptions: Interruptible and Gracefully Interruptible. Choose the appropriate strategy based on whether the workload can tolerate interruptions, such as during Node maintenance or resource pressure.
Interruptible Pods
Use the qos.coreweave.cloud/interruptable
label to identify Pods that can be safely evicted. When applied, this label instructs CKS to preempt Pods if necessary to prioritize critical workloads, such as large-scale training jobs, or to ensure Node stability.
Apply the label to the Pod's spec as follows:
metadata:labels:qos.coreweave.cloud/interruptable: "true"
This label is ideal for inference Pods and other stateless workloads that can safely restart without data loss or significant user impact.
Never apply the interruptable
label to distributed training jobs, single-instance databases, stateful services, or any workload that cannot tolerate interruptions. Evicting these workloads may cause data loss or require costly restarts across multiple Nodes.
Gracefully Interruptible Pods
Use the qos.coreweave.com/graceful-interruptible
label to allow CKS to gracefully terminate Pods during planned events, such as Node maintenance, reboots, or infrastructure upgrades.
Apply the label to the Pod's spec as follows:
metadata:labels:qos.coreweave.cloud/graceful-interruptible: "true"
With this label, CKS initiates termination by sending a DELETE
request to the Pod. Kubernetes honors the Pod's terminationGracePeriodSeconds
, allowing the Pod to exit cleanly before Node maintenance begins.
During the grace period, Pods can finish processing requests, close connections, and persist necessary data. This minimizes service disruption and helps prevent data loss during infrastructure updates.
Label reference
Label Key | Value | Description |
---|---|---|
qos.coreweave.cloud/interruptable | "true" | Marks Pods as safe for immediate eviction by CKS when prioritizing critical workloads or managing Node stability. |
qos.coreweave.com/graceful-interruptible | "true" | Allows CKS to gracefully terminate Pods during planned maintenance, respecting the Pod's configured terminationGracePeriodSeconds . |
Taints
Control plane taints
CKS uses taints in control plane namespaces to control where Pods are scheduled. The taint value is a hash of the /reserved
label, such as control-plane
or triage
.
Nodes inside the control-plane
Node Pool are unavailable for customer workloads, and are not billable to the customer.
User taints
CKS applies taints to all Nodes, including production Nodes, in user namespaces.
Customers should exercise caution before attempting to add tolerations to their Pods to ensure workloads always run on healthy Nodes.
CPU taints
The CKS admission policy automatically adds the is_cpu_compute
toleration to any Pod that does not request a GPU resource. This ensures that these Pods are only scheduled on CPU-only Nodes.
- effect: NoSchedulekey: is_cpu_compute
GPU taints
GPU Nodes have the is_gpu=true:PreferNoSchedule
taint to prevent Pods that request CPU-only resources from scheduling onto GPU Nodes, unless the Pods are required. A CPU-only Pod can still schedule on a GPU Node if no CPU Nodes are available.
- effect: PreferNoSchedulekey: is_gpuvalue: "true"
SUNK-specific scheduling
SUNK's /lock
taint
To prevent contention with other Pods that request GPU access while long-running slurmd
Pods are active, SUNK adds a new GPU resource to Kubernetes, sunk.coreweave.com/accelerator
, in addition to the nvidia.com/gpu
resource provided by NVIDIA's plugin.
Because the GPU has two different resource names, Kubernetes tracks the consumption separately, which allows Slurm Pods to request the same underlying GPU as other Kubernetes Pods. However, this requires SUNK to manage GPU contention instead of the Kubernetes scheduler.
SUNK manages the contention with a taint called sunk.coreweave.com/lock
. SUNK applies this taint to Nodes by making a call to slurm-syncer
during the prolog phase.
- effect: NoExecutekey: sunk.coreweave.com/lockvalue: "true"
Prolog completion is blocked until all Pods that do not tolerate the taint have been evicted.
DaemonSets on SUNK Nodes
Kubernetes DaemonSets that run on SUNK Nodes must tolerate the sunk.coreweave.com/lock
taint, as well as is_cpu_compute
, is_gpu
, and node.coreweave.cloud/reserved
:
1spec:2tolerations:3- key: sunk.coreweave.com/lock4value: "true"5operator: Equal67- key: is_cpu_compute8operator: Exists910- key: is_gpu11operator: Exists1213- key: node.coreweave.cloud/reserved14value: <ORG_ID_HASH_VALUE>15operator: Equal
Scaling down workloads
Scaling Pods down in a specific order can be achieved by adjusting the cluster specification via the Cloud Console, or via the CKS cluster API. For more information on scaling down resources, see the official Kubernetes documentation.