Node locking - CoreWeave Docs

This page explains how SUNK uses Node locking to prevent Kubernetes workloads from running on a Node while a Slurm job is active. It’s intended for cluster administrators and operators who need to understand how SUNK coordinates resource ownership between Slurm and Kubernetes. Because a NodeSet Pod is idle when the Slurm node is idle, other Kubernetes workloads may schedule on the Kubernetes Node. However, when the NodeSet Pod becomes active, Kubernetes must evict the Kubernetes workloads to prevent resource allocation and deallocation race conditions between Slurm and Kubernetes. To verify eviction completes before the Slurm job starts, the Pod communicates the Node’s workload state.

Operation overview

SUNK contains a built-in prolog script that verifies the Kubernetes Nodes are ready for Slurm Jobs, and locks the Nodes from accepting additional Kubernetes workloads before the job starts. The prolog script calls the Syncer’s hook API, which then triggers a set of state updates through the Pod Controller and Node Controller to complete the process. Both normal Slurm jobs and placeholder jobs created by the SUNK Pod Scheduler use this prolog script.

Node locking taints and annotations

In this process, SUNK uses Kubernetes taints and annotations to facilitate the desired behavior. The taint key used on the Node is sunk.coreweave.com/lock. The Pod annotation uses the same key as the Node taint, with the following values:

false
pending
locked
locked_strict

The check for other workloads finds all Pods that don’t tolerate the taint, since the Node may have workloads that remain running, such as DaemonSets that handle logging. Two levels of checks facilitate locks for Slurm, versus locks for Kubernetes workloads scheduled through Slurm. Slurm jobs use normal locking, while SUNK Pod Scheduler placeholder jobs use strict locking. SUNK doesn’t evict other workloads scheduled by the SUNK Pod Scheduler during this process.

Locking

During locking, two different operations proceed at the same time:

SUNK propagates the Slurm node state to the Kubernetes Node and updates the lock, as described in the operation overview.
The Syncer pre-hook performs a blocking check of the lock state.

If the prolog script receives an error or reaches timeout, the prolog script fails, causing the node to drain. This is preferred, because if the script doesn’t stop execution at this point, the job can create resource contention or issues that are complex to troubleshoot.

Unlocking

Once the Slurm job completes, SUNK must release the Node so it can again accept Kubernetes workloads. The unlocking flow removes the lock from the Node when the Slurm node is no longer running.

​Operation overview

​Node locking taints and annotations

​Locking

​Unlocking

Operation overview

Node locking taints and annotations

Locking

Unlocking