The Node Controller is responsible for managing data synchronized between Kubernetes Nodes and the NodeSlices and NodeSet Pods. The Controller is deployed cluster-wide as part of theDocumentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
sunk-controller-manager. The Node Controller performs all Node operations on behalf of the individual Slurm cluster instances, removing the requirement to grant extra permissions to modify Kubernetes Nodes.
Information flow and operations
The Node Controller handles the flow of information from the NodeSlices to the Node and from the NodeSet Pods to the Node. Some possible information flows are described below.Node lock
Because a NodeSet Pod is idle when the Slurm node is idle, other Kubernetes workloads may schedule on the Kubernetes Node. However, when the NodeSet Pod becomes active, these other workloads must be evicted and prevented from rescheduling. This is accomplished through node locking. The Node Controller is responsible for managing the lock status. The Pod Controller then propagates the lock annotation back to the Pod, where it is picked up by the Syncer.Slurm drain annotation and cordon condition
When a Slurm node is put into a drain state, the Pod will have a condition and annotation set.- The node annotation containing the drain reason is
sunk.coreweave.com/drain - The node condition for Cordon is
SlurmCordon - When the pod state is not known, the condition is
Unknown
SlurmCordon condition is only set if the drain originated in Slurm, rather than due to a Kubernetes state.
Pod conditions
In addition to the special handling of theSlurmCordon condition, the Node Controller synchronizes several additional conditions from the Pod to the Node, including:
SlurmDrainSlurmRunningSlurmNotResponding
Unknown.
NodeSet labels
To identify Nodes with a particular NodeSet, the Node Controller labels them with information obtained from the NodeSlice objects. When a Node is present in a NodeSlice, it will be labeled. If a Node is not present in any NodeSlice, then the labels will be removed. Relevant labels include:sunk.coreweave.com/nodeset- The name of the NodeSet the node is associated with.sunk.coreweave.com/namespace- The namespace of the NodeSet.sunk.coreweave.com/cluster- The name of the cluster the NodeSet is associated with.sunk.coreweave.com/pod- The name of the associated pod, if present.