April 25, 2025 - SUNK v6.3.0 release
SUNK v6.3.0 released with enhanced enroot support, topology improvements, and GPU configuration updates
Update SUNK SUNK v6.3.0 has been released with significant improvements to enroot container support, topology configuration enhancements, and updated GPU configurations for newer hardware.
Overview
SUNK v6.3.0 focuses on container runtime improvements, enhanced topology management, and better support for modern GPU hardware. This release includes important fixes for block topology and improved enroot integration.
Key changes
Enhanced enroot support
- Pyxis integration: Added pyxis support in kind for better container management
- Enroot profile configuration: Exposed enroot profile configuration for greater customization
- PMI hook integration: Added enroot PMI hook for improved parallel job support
- Container runtime improvements: Enhanced container runtime capabilities and configuration
Topology and scheduling improvements
- Block topology fixes: Fixed block size calculation to use max instead of min block size in
getBlockSize()
- Alphabetical Node sorting: Added patch to topology/block plugin to enable alphabetical Node sorting
- GPU configuration updates: Updated memory and CPU specifications for newer GPU types
- Scheduling optimizations: Improved scheduling algorithms for better resource allocation
GPU hardware support
- Hardware updates: Updated configurations for newer GPU types with improved memory and CPU specifications
- Device plugin enhancements: Improved device plugin configurations and health check handling
- NVIDIA XID handling: Added functionality to ignore XID145 via
dpDisableHealthchecks
Developer and testing improvements
- Data race fixes: Fixed data races in Slurm client tests
- Golint upgrade: Updated golint to version 7.0.0 for better code quality
- Test infrastructure: Improved test infrastructure and CI/CD pipeline reliability
Monitoring and observability
- Dashboard updates: Updated Grafana dashboards with improved metrics and visualizations
- Logging improvements: Enhanced logging capabilities and configuration options
- Dependency updates: Updated various monitoring and logging dependencies
Configuration changes
Default settings
- Enhanced enroot configuration options
- Improved topology configuration defaults
- Updated GPU hardware specifications
- Better container runtime configurations
Migration notes
Existing SUNK v6.2.0 deployments will continue to work, but you may want to:
- Review enroot configuration if using custom container setups
- Test topology configuration changes with your existing Node layouts
- Verify GPU hardware configurations match your current hardware
- Update any custom container runtime configurations
- Test the improved scheduling algorithms with your workloads
Breaking changes
- Block topology calculation: Changed from min to max block size calculation (may affect Node grouping)
- Node sorting: Alphabetical Node sorting is now enabled by default (may change Node ordering)
Documentation
For detailed information about configuring and using SUNK v6.3.0, see: