SUNK v6.0.0 has been released with significant new features and some breaking changes that require attention.Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
Breaking changes
New features
This release introduces several enhancements across multiple components. The accounting system now includessacctmgr ping as readiness and liveness probe for improved health monitoring. Resource management has been enhanced with separate slurm-login pod resource defaults, and job processing now supports skipping specific job IDs when needed.
The Helm charts have received numerous improvements including default sshd configuration for login services, default UnkillableStepProgram integration in the Slurm chart, and NVIDIA_DRIVER_CAPABILITIES set to all for compute pods. The sunk controller now includes priority class configuration, and node cleanup is enabled by default in the Slurm Chart. Individual login services can now have metadata set separately, and MySQL memory requests have been updated to 64Gi in values-cw within the slurm chart.
The operator component now supports allowgaps in topology.conf and exports additional sunk metrics. The Syncer has been enhanced with node state metrics and improved reboot functionality using production-powerreset, while the HooksAPI endpoint has been refactored to accept safe reboot requests. Affinity merging is now supported, and the system has been upgraded to Slurm 24.11 for enhanced performance and stability.