Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

The Node Controller is responsible for managing data synchronized between Kubernetes Nodes and the NodeSlices and NodeSet Pods. The Controller is deployed cluster-wide as part of the sunk-controller-manager. The Node Controller performs all Node operations on behalf of the individual Slurm cluster instances, removing the requirement to grant extra permissions to modify Kubernetes Nodes.

Information flow and operations

The Node Controller handles the flow of information from the NodeSlices to the Node and from the NodeSet Pods to the Node. Some possible information flows are described below.

Node lock

Because a NodeSet Pod is idle when the Slurm node is idle, other Kubernetes workloads may schedule on the Kubernetes Node. However, when the NodeSet Pod becomes active, these other workloads must be evicted and prevented from rescheduling. This is accomplished through node locking. The Node Controller is responsible for managing the lock status. The Pod Controller then propagates the lock annotation back to the Pod, where it is picked up by the Syncer.

Slurm drain annotation and cordon condition

When a Slurm node is put into a drain state, the Pod will have a condition and annotation set.
  • The node annotation containing the drain reason is sunk.coreweave.com/drain
  • The node condition for Cordon is SlurmCordon
  • When the pod state is not known, the condition is Unknown
This allows Kubernetes to react to drain events in Slurm. The annotation reflects the Slurm reason for the drain. The SlurmCordon condition is only set if the drain originated in Slurm, rather than due to a Kubernetes state.

Pod conditions

In addition to the special handling of the SlurmCordon condition, the Node Controller synchronizes several additional conditions from the Pod to the Node, including:
  • SlurmDrain
  • SlurmRunning
  • SlurmNotResponding
These conditions will be present on the Node when it is part of the NodeSet, and removed when the Node is no longer in the NodeSet. If the respective Pod is missing any of these conditions, the conditions’s status will be set to Unknown.

NodeSet labels

To identify Nodes with a particular NodeSet, the Node Controller labels them with information obtained from the NodeSlice objects. When a Node is present in a NodeSlice, it will be labeled. If a Node is not present in any NodeSlice, then the labels will be removed. Relevant labels include:
  • sunk.coreweave.com/nodeset - The name of the NodeSet the node is associated with.
  • sunk.coreweave.com/namespace - The namespace of the NodeSet.
  • sunk.coreweave.com/cluster - The name of the cluster the NodeSet is associated with.
  • sunk.coreweave.com/pod - The name of the associated pod, if present.
Last modified on March 24, 2026