June 2, 2025 - SUNK v6.4.1 release
SUNK v6.4.1 released with critical memory parsing fixes, MOTD improvements, and container runtime enhancements
Update SUNK SUNK v6.4.1 has been released as a patch release with critical fixes for memory parsing, MOTD script handling, and container runtime improvements.
Overview
SUNK v6.4.1 is a focused patch release addressing important issues discovered in v6.4.0, including a critical memory parsing fix, improved MOTD script mounting, and enhanced container runtime stability.
Key changes
Critical fixes
Memory parsing fix: Added Slurm patch to fix parsing of node reg mem percent. This should fix the Low RealMemory drains we have seen recently on Mistral's RNO2 cluster.
MOTD script mounting: Use subpath for mounting MOTD scripts. This avoids conflicts between packages updating /etc/update-motd.d/
and the dynamic login pod MOTD script introduced in SUNK v6.4.0.
Login template configuration: Fixed login template config map rendering. This makes it easier to view and edit the data in this configmap by changing raw \n
characters to rendered newlines in the output.
Container and runtime improvements
NVIDIA container toolkit: Pinned the nvidia-container-toolkit
version to 1.17.6-1
. This avoids an issue with version 1.17.7-1
of the NVIDIA container toolkit breaking Slurm pyxis/enroot functionality.
RDMA configuration: Removed the RDMA/IB setting from the L40 and L40S compute definitions. These compute definitions did not require this setting.
Configuration changes
Default settings
- Fixed memory parsing configurations
- Improved MOTD script mounting with subpath approach
- Enhanced container runtime stability
- Cleaned up unnecessary RDMA configuration
Migration notes
This is a patch release that existing SUNK v6.4.0 deployments should upgrade to in order to address:
- Memory parsing issues: Critical fix for memory allocation problems
- MOTD functionality: Improved MOTD script functionality for login Nodes
- Container stability: Better container runtime stability and error handling
Upgrade recommendation: All v6.4.0 deployments should upgrade to v6.4.1 to resolve the memory parsing issue.
Bug fixes
- Memory allocation: Fixed critical memory parsing bug that could affect resource allocation
- MOTD scripts: Resolved MOTD script mounting issues on login Nodes using subpath mounting
- Container errors: Suppressed spurious errors in enroot architecture detection
- RDMA configuration: Cleaned up unnecessary RDMA configuration for L40 GPUs
- Login template: Fixed config map rendering for better readability