Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

Because a NodeSet Pod is idle when the Slurm node is idle, other Kubernetes workloads may schedule on the Kubernetes Node. However, when the NodeSet Pod becomes active, the Kubernetes workloads must be evicted to prevent resource allocation and deallocation race conditions between Slurm and Kubernetes. To verify eviction has completed before the Slurm job is started, the Node’s workload state is communicated via the Pod.

Operation overview

SUNK contains a built-in prolog script that verifies the Kubernetes Nodes are ready for Slurm Jobs, and locks the Nodes from accepting additional Kubernetes workloads before the job starts. The prolog script calls the Syncer’s hook api, which then triggers a set of state updates through the Pod Controller and Node Controller to complete the process. Normal Slurm jobs and placeholder jobs created by the SUNK Pod Scheduler both use this prolog script.

Node locking taints and annotations

In this process, Kubernetes taints and annotations are used to facilitate the desired behavior. The taint key used on the Node is sunk.coreweave.com/lock. The Pod annotation uses the same key as the Node taint, with the following values:
  • false
  • pending
  • locked
  • locked_strict
The check for other workloads finds all Pods that do not tolerate the taint, since the Node may have workloads that remain running, such as DaemonSets that handle logging. There are two levels of checks to facilitate locks for Slurm, versus locks for Kubernetes workloads scheduled through Slurm. Slurm jobs use normal locking, while SUNK Pod Scheduler placeholder jobs use strict locking. Other workloads scheduled by the SUNK Pod Scheduler are not evicted during this process.

Locking

When locking, two different operations proceed at the same time:
  • The Slurm node state is propagated to the Kubernetes Node and updates the lock, as described in the operation overview.
  • The Syncer pre-hook performs a blocking check of the lock state.
If the prolog script receives an error or reaches timeout, the prolog script will fail, causing the node to drain. This is preferred, because if the script does not stop execution at this point, the job can create resource contention or issues that are complex to troubleshoot.

Unlocking

The flow for unlocking simply removes the lock from the Node when the Slurm node is no longer running.
Last modified on March 24, 2026