Skip to main content

March 14, 2025 - SUNK v6.0.0 release

SUNK v6.0.0 released with major version upgrade and new features

Update SUNK SUNK v6.0.0 has been released with significant new features and some breaking changes that require attention.

Breaking changes

⚠️ BREAKING CHANGES: The SUNK manager deployment needs to be recreated with updated labels. Additionally, overrides for various Slurm configuration will need to be updated to reflect the new structure.

New features

This release introduces several enhancements across multiple components. The accounting system now includes sacctmgr ping as readiness and liveness probe for improved health monitoring. Resource management has been enhanced with separate slurm-login pod resource defaults, and job processing now supports skipping specific job IDs when needed.

The Helm charts have received numerous improvements including default sshd configuration for login services, default UnkillableStepProgram integration in the Slurm chart, and NVIDIA_DRIVER_CAPABILITIES set to all for compute pods. The sunk controller now includes priority class configuration, and node cleanup is enabled by default in the Slurm Chart. Individual login services can now have metadata set separately, and MySQL memory requests have been updated to 64Gi in values-cw within the slurm chart.

The operator component now supports allowgaps in topology.conf and exports additional sunk metrics. The Syncer has been enhanced with node state metrics and improved reboot functionality using production-powerreset, while the HooksAPI endpoint has been refactored to accept safe reboot requests. Affinity merging is now supported, and the system has been upgraded to Slurm 24.11 for enhanced performance and stability.

Removed components

Several components have been removed from the codebase including Sentry integration from the operator, Slurm-login codebase, and all related charts. The Slurm-login component has been updated with meta %h support for username hashing.

Additional resources