March 14, 2025 - SUNK v6.0.0 release
SUNK v6.0.0 released with major version upgrade and new features
Update SUNK SUNK v6.0.0 has been released with significant new features and some breaking changes that require attention.
Breaking changes
⚠️ BREAKING CHANGES: The SUNK manager deployment needs to be recreated with updated labels. Additionally, overrides for various Slurm configuration will need to be updated to reflect the new structure.
New features
This release introduces several enhancements across multiple components. The accounting system now includes sacctmgr ping
as readiness and liveness probe for improved health monitoring. Resource management has been enhanced with separate slurm-login
pod resource defaults, and job processing now supports skipping specific job IDs when needed.
The Helm charts have received numerous improvements including default sshd
configuration for login services, default UnkillableStepProgram
integration in the Slurm chart, and NVIDIA_DRIVER_CAPABILITIES
set to all
for compute pods. The sunk controller now includes priority class configuration, and node cleanup is enabled by default in the Slurm Chart. Individual login services can now have metadata set separately, and MySQL memory requests have been updated to 64Gi
in values-cw
within the slurm chart.
The operator component now supports allowgaps
in topology.conf
and exports additional sunk metrics. The Syncer has been enhanced with node state metrics and improved reboot functionality using production-powerreset
, while the HooksAPI endpoint has been refactored to accept safe reboot requests. Affinity merging is now supported, and the system has been upgraded to Slurm 24.11 for enhanced performance and stability.
Removed components
Several components have been removed from the codebase including Sentry integration from the operator, Slurm-login codebase, and all related charts. The Slurm-login component has been updated with meta %h support for username hashing.