Skip to main content

Workload Scheduling on CKS

Control where Pods are deployed through namespaces, labels, and taints

In CoreWeave Kubernetes Service (CKS), Nodes are organized using namespaces, labels, and taints. These features control where CoreWeave's core managed services are scheduled and ensure that customer workloads always run on healthy production Nodes.

CKS namespaces

CKS has two types of namespaces:

  • User namespaces are created by customers and labeled with the customer's Org ID.
    • Customers have full control over user namespaces - they can create, change, and delete them.
  • Control plane namespaces are created by CoreWeave to host critical services that run within the cluster.
    • Customers should not alter or delete these namespaces. CoreWeave workloads are automated within these managed via automation in these namespaces. Jobs that run in control plane namespaces are not billed to the customer.

CKS applies the label ns.coreweave.cloud/org: control-plane to all control plane namespaces. To view these namespaces in a CKS cluster, select them using kubectl with the --selector option:

Example
$
kubectl get namespaces --selector=ns.coreweave.cloud/org=control-plane

Output:

Example
NAME STATUS AGE
hpc-verification Active 19d
kube-system Active 22d
node-problem-detector Active 19d

Node type selection labels

All CoreWeave Nodes feature Instance IDs. To ensure consistency, all Nodes within a Node Pool are tagged with an Instance ID using the instance-type label. For example, given a Node Pool comprised of Node type example, all Nodes in that Node Pool feature the label node.kubernetes.io/instance-type=instance-type-example. For a list of all Instance IDs, see: Instances.

Note

Some customers may be using the node.kubernetes.io/type label. This label has also been updated to reference the new Node Instance ID. For example, node.kubernetes.io/type=H100_NVLINK_80GB.8.xeon.128 will now return the value node.kubernetes.io/type=gd-8xh100ib-i128.

Node lifecycle labels

CKS uses labels to identify a Node's state in the Node lifecycle. These labels ensure that customer workloads are always scheduled on healthy production Nodes, and that CoreWeave's critical core services are always scheduled on control plane Nodes.

  • node.coreweave.cloud/state: Identifies the Node's state in the lifecycle, such as production, zap, or triage.
  • node.coreweave.cloud/reserved: Identifies the type of workloads that can run on the Node:
    • If the Node is reserved for user workloads and /state is production, the value is the customer's Org ID.
    • If the Node is reserved for control plane workloads and /state is production, the value is control-plane.
    • If /state is not production, the value is the same as /state.
  • node.coreweave.cloud/reserved-desired: Overrides a Node's reserved value. If /reserved-desired does not match the /reserved label, the Node is pending. This procedure moves Nodes from the organization's reservation to the control-plane reservation to prevent workloads from being scheduled on pending Nodes.

User-provided labels

Customers can create custom Node labels for their own purposes, such as scheduling or organization.

However, new labels cannot be created in the *.coreweave.cloud or *.coreweave.com namespaces. Attempts to create labels in these namespaces are rejected by CKS. For example, the following labels are not allowed:

Example
metadata:
labels:
foo.coreweave.cloud/bar: "true"
foo.coreweave.com/bar: "true"
Namespaces `*.coreweave.cloud` or `*.coreweave.com` are not allowed.

Pod interruption

CoreWeave supports two strategies for handling Pod interruptions: Interruptible and Gracefully Interruptible. Choose the appropriate strategy based on whether the workload can tolerate interruptions, such as during Node maintenance or resource pressure.

Interruptible Pods

Use the qos.coreweave.cloud/interruptable label to identify Pods that can be safely evicted. When applied, this label instructs CKS to preempt Pods if necessary to prioritize critical workloads, such as large-scale training jobs, or to ensure Node stability.

Apply the label to the Pod's spec as follows:

Example
metadata:
labels:
qos.coreweave.cloud/interruptable: "true"

This label is ideal for inference Pods and other stateless workloads that can safely restart without data loss or significant user impact.

Danger

Never apply the interruptable label to distributed training jobs, single-instance databases, stateful services, or any workload that cannot tolerate interruptions. Evicting these workloads may cause data loss or require costly restarts across multiple Nodes.

Gracefully Interruptible Pods

Use the qos.coreweave.com/graceful-interruptible label to allow CKS to gracefully terminate Pods during planned events, such as Node maintenance, reboots, or infrastructure upgrades.

Apply the label to the Pod's spec as follows:

Example
metadata:
labels:
qos.coreweave.cloud/graceful-interruptible: "true"

With this label, CKS initiates termination by sending a DELETE request to the Pod. Kubernetes honors the Pod's terminationGracePeriodSeconds, allowing the Pod to exit cleanly before Node maintenance begins.

During the grace period, Pods can finish processing requests, close connections, and persist necessary data. This minimizes service disruption and helps prevent data loss during infrastructure updates.

Label reference

Label KeyValueDescription
qos.coreweave.cloud/interruptable"true"Marks Pods as safe for immediate eviction by CKS when prioritizing critical workloads or managing Node stability.
qos.coreweave.com/graceful-interruptible"true"Allows CKS to gracefully terminate Pods during planned maintenance, respecting the Pod's configured terminationGracePeriodSeconds.

Taints

Control plane taints

CKS uses taints in control plane namespaces to control where Pods are scheduled. The taint value is a hash of the /reserved label, such as control-plane or triage.

Info

Nodes inside the control-plane Node Pool are unavailable for customer workloads, and are not billable to the customer.

User taints

CKS applies taints to all Nodes, including production Nodes, in user namespaces.

Danger

Customers should exercise caution before attempting to add tolerations to their Pods to ensure workloads always run on healthy Nodes.

CPU taints

The CKS admission policy automatically adds the is_cpu_compute toleration to any Pod that does not request a GPU resource. This ensures that these Pods are only scheduled on CPU-only Nodes.

CPU taint
- effect: NoSchedule
key: is_cpu_compute

GPU taints

GPU Nodes have the is_gpu=true:PreferNoSchedule taint to prevent Pods that request CPU-only resources from scheduling onto GPU Nodes, unless the Pods are required. A CPU-only Pod can still schedule on a GPU Node if no CPU Nodes are available.

GPU taint
- effect: PreferNoSchedule
key: is_gpu
value: "true"

SUNK-specific scheduling

SUNK's /lock taint

To prevent contention with other Pods that request GPU access while long-running slurmd Pods are active, SUNK adds a new GPU resource to Kubernetes, sunk.coreweave.com/accelerator, in addition to the nvidia.com/gpu resource provided by NVIDIA's plugin.

Because the GPU has two different resource names, Kubernetes tracks the consumption separately, which allows Slurm Pods to request the same underlying GPU as other Kubernetes Pods. However, this requires SUNK to manage GPU contention instead of the Kubernetes scheduler.

SUNK manages the contention with a taint called sunk.coreweave.com/lock. SUNK applies this taint to Nodes by making a call to slurm-syncer during the prolog phase.

SUNK's lock taint
- effect: NoExecute
key: sunk.coreweave.com/lock
value: "true"
Important

Prolog completion is blocked until all Pods that do not tolerate the taint have been evicted.

DaemonSets on SUNK Nodes

Kubernetes DaemonSets that run on SUNK Nodes must tolerate the sunk.coreweave.com/lock taint, as well as is_cpu_compute, is_gpu, and node.coreweave.cloud/reserved:

Example toleration
1
spec:
2
tolerations:
3
- key: sunk.coreweave.com/lock
4
value: "true"
5
operator: Equal
6
7
- key: is_cpu_compute
8
operator: Exists
9
10
- key: is_gpu
11
operator: Exists
12
13
- key: node.coreweave.cloud/reserved
14
value: <ORG_ID_HASH_VALUE>
15
operator: Equal

Scaling down workloads

Scaling Pods down in a specific order can be achieved by adjusting the cluster specification via the Cloud Console, or via the CKS cluster API. For more information on scaling down resources, see the official Kubernetes documentation.