Because a NodeSet Pod is idle when the Slurm node is idle, other Kubernetes workloads may schedule on the Kubernetes Node. However, when the NodeSet Pod becomes active, the Kubernetes workloads must be evicted to prevent resource allocation and deallocation race conditions between Slurm and Kubernetes. To verify eviction has completed before the Slurm job is started, the Node’s workload state is communicated via the Pod.Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
Operation overview
SUNK contains a built-in prolog script that verifies the Kubernetes Nodes are ready for Slurm Jobs, and locks the Nodes from accepting additional Kubernetes workloads before the job starts. The prolog script calls the Syncer’s hook api, which then triggers a set of state updates through the Pod Controller and Node Controller to complete the process. Normal Slurm jobs and placeholder jobs created by the SUNK Pod Scheduler both use this prolog script.Node locking taints and annotations
In this process, Kubernetes taints and annotations are used to facilitate the desired behavior. The taint key used on the Node issunk.coreweave.com/lock.
The Pod annotation uses the same key as the Node taint, with the following values:
falsependinglockedlocked_strict
Locking
When locking, two different operations proceed at the same time:- The Slurm node state is propagated to the Kubernetes Node and updates the lock, as described in the operation overview.
- The Syncer pre-hook performs a blocking check of the lock state.