Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
Configuration changes
The default value ofSelectTypeParameters is now CR_CPU_MEMORY.
Modifications to the DefMemPerCPU parameter for nodesets now allow users to submit jobs without specifying a memory value.
New default SCIM configuration for Groups
Thensscache configuration contains a new parameter for defining SCIM groups:
sunkPosixGroupName, is used when the SCIM server provides a custom field for group names. If the SCIM server does not provide a custom field, nsscache will use the displayName, name, or id from the SCIM group resource.
Name Slurm nodes with Kubernetes labels
You may now customize the name of a Slurm Pod using the optionalpodNameLabel parameter in the compute-nodeset.yaml Helm chart. This parameter accepts a single label name.
The defined label must be unique to the Node. If multiple Nodes use the same label value, only one Pod will be created.
The resulting Pod name follows the convention <prefix>-<name>, where <name> is the value of the selected label.
If no label is provided, Slurm nodes are automatically assigned names based on IP address or node name.
Capture slurmd and slurmstepd logs
Logs from slurmd and slurmstepd can now be captured and output to a log file.
New automatic checks
An automatic check for an existing/dev/infiniband definition now occurs upon slurmd pod startup.
A liveness probe for munged containers now runs by default.
Automatic HPC verification for drained nodes with duplicate job ids
The sunk:verify-undrain drain reason will now be added to nodes in the drain state with the reason duplicate job ids. These nodes can be safely undrained and will be included in the automated HPC verification workflow for SUNK.
Bug fixes
This release also fixes several bugs, including issues related to:- Slurm controller crashing upon creating a new SUNK cluster
- Requeueing placeholder Slurm jobs