Skip to main content

November 13, 2025 - SUNK v7.0.0 release

Release SUNK

Overview

SUNK v7.0.0 introduces several new features and improvements, including a major version upgrade to Slurm 25.05.3, more consistent node scaling behavior, new default Slurm chart configurations, improved memory management, and bug fixes.

Upgrade to Slurm 25.05.3

SUNK v7.0.0 upgrades Slurm to version 25.05.3. This upgrade includes upstream bug patches and performance optimizations which can benefit training job performance.

Priority-based node scaling

The NodeSet scaling-down behavior is now priority-based and consistent with rolling upgrade behavior. Scaling down now prefers idle nodes and respects pod priorities (not-ready > idle > draining > running), with timestamps as a deterministic tiebreaker. To further improve resource utilization and reduce churn, newer pods are deleted first, thus preserving pods with more accumulated state/history.

This change impacts all NodeSet scale-down operations when the ScaleDownPriorityOrdering feature flag is enabled. This feature flag is enabled by default.

Compute Nodes allocate memory to slurmd on startup, reducing OOM errors

By default, nodes will now have RealMemory configured to the values specified in resource.limits.memory during the slurmd startup. These values can be overridden by the <compute-def>.realMemory key.

Setting this value during the slurmd startup ensures that slurmd uses only the memory allocated to it, instead of the entire memory of the node. This reserves memory for the necessary Kubernetes workloads, thereby reducing the possibility of Out of Memory (OOM) errors.

MySQL Operator support

SUNK v7.0.0 introduces MySQL Operator support for Slurm accounting, allowing for easier maintenance and management of the MySQL database. Any existing records that need to be retained will have to be migrated over during the upgrade. For assistance with this process, please contact CoreWeave Support.

nsscache now the default directory service

nsscache is now the default directory service for SUNK. This replaces the previous directory service based on SSSD. For more information, see manage users with nsscache.

  • This changes the existing behavior of how users are made available in Slurm
  • Slurm users are automatically created so the user no longer needs to do so

login.sshdConfig now accepts individual values without overwriting defaults

Users can now set individual values for the login.sshdConfig parameter in the Slurm chart without overwriting default values. Default values will persist unless overridden.

  • The YAML template will convert the no into boolean value by default. Users need to use quotes in configuration.

task/cgroup enabled by default

task/cgroup is now enabled by default. This allows for more flexibility when configuring partitions, both manually and automatically created. Values can be set per partition and defaults are called out. Changes to the partitions configuration parameters will not affect existing NodeSet configurations.

Additionally, for automatically created partitions, the defCPUPerGPU and defMemPerCPU values are now automatically defined.

/coredumps directory now exists in Slurm images

The path to coredump is now set to an existing folder (/coredumps), residing on emptydir. This ensures that coredumps are saved in the expected path, and that they can be easily accessed even if the slurmd pod is in crashloopbackoff.

Multiple SSH keys now usable via SUNK User Provisioning

This patch updates SCIM ingestion to allow writing multiple authorized keys per-user to the nsscache SSH key cache. Users can now add more than one SSH key in User Settings without losing SSH access.

Fixed a bug preventing Pyxis from being used inside a Slurm job

This patch resolves issues related to the --container-name=mycontainer:exec flag and enroot list -f command when running Pyxis inside a Slurm job. With this bug fix, users can now use Pyxis inside a Slurm job without encountering errors.

Support for new IB/RoCE labels for topology generation

This release adds support for newly-defined Infiniband labels used for topology generation.