Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

Configuration changes

The default value of SelectTypeParameters is now CR_CPU_MEMORY. Modifications to the DefMemPerCPU parameter for nodesets now allow users to submit jobs without specifying a memory value.

Possible action required

This configuration change may result in errors when all of the following applies:
  • SUNK v7.3.0 or above is in use
  • DefMemPerCPU already has a value defined in the default partition
  • Users submit jobs without specifying the correct memory value
  • The --exclusive flag is in use
To avoid errors in this case, use the previous default value of SelectTypeParameters: CR_Core or update DefMemPerCPU accordingly.

New default SCIM configuration for Groups

The nsscache configuration contains a new parameter for defining SCIM groups:
nsscacheConfig:
    group:
      scim_path_groupname: sunkPosixGroupName
The default path, sunkPosixGroupName, is used when the SCIM server provides a custom field for group names. If the SCIM server does not provide a custom field, nsscache will use the displayName, name, or id from the SCIM group resource.

Name Slurm nodes with Kubernetes labels

You may now customize the name of a Slurm Pod using the optional podNameLabel parameter in the compute-nodeset.yaml Helm chart. This parameter accepts a single label name. The defined label must be unique to the Node. If multiple Nodes use the same label value, only one Pod will be created. The resulting Pod name follows the convention <prefix>-<name>, where <name> is the value of the selected label. If no label is provided, Slurm nodes are automatically assigned names based on IP address or node name.

Capture slurmd and slurmstepd logs

Logs from slurmd and slurmstepd can now be captured and output to a log file.

New automatic checks

An automatic check for an existing /dev/infiniband definition now occurs upon slurmd pod startup. A liveness probe for munged containers now runs by default.

Automatic HPC verification for drained nodes with duplicate job ids

The sunk:verify-undrain drain reason will now be added to nodes in the drain state with the reason duplicate job ids. These nodes can be safely undrained and will be included in the automated HPC verification workflow for SUNK.

Bug fixes

This release also fixes several bugs, including issues related to:
  • Slurm controller crashing upon creating a new SUNK cluster
  • Requeueing placeholder Slurm jobs
Last modified on March 24, 2026