Skip to main content

April 25, 2025 - SUNK v6.3.0 release

SUNK v6.3.0 released with enhanced enroot support, topology improvements, and GPU configuration updates

Update SUNK SUNK v6.3.0 has been released with significant improvements to enroot container support, topology configuration enhancements, and updated GPU configurations for newer hardware.

Overview

SUNK v6.3.0 focuses on container runtime improvements, enhanced topology management, and better support for modern GPU hardware. This release includes important fixes for block topology and improved enroot integration.

Key changes

Enhanced enroot support

  • Pyxis integration: Added pyxis support in kind for better container management
  • Enroot profile configuration: Exposed enroot profile configuration for greater customization
  • PMI hook integration: Added enroot PMI hook for improved parallel job support
  • Container runtime improvements: Enhanced container runtime capabilities and configuration

Topology and scheduling improvements

  • Block topology fixes: Fixed block size calculation to use max instead of min block size in getBlockSize()
  • Alphabetical Node sorting: Added patch to topology/block plugin to enable alphabetical Node sorting
  • GPU configuration updates: Updated memory and CPU specifications for newer GPU types
  • Scheduling optimizations: Improved scheduling algorithms for better resource allocation

GPU hardware support

  • Hardware updates: Updated configurations for newer GPU types with improved memory and CPU specifications
  • Device plugin enhancements: Improved device plugin configurations and health check handling
  • NVIDIA XID handling: Added functionality to ignore XID145 via dpDisableHealthchecks

Developer and testing improvements

  • Data race fixes: Fixed data races in Slurm client tests
  • Golint upgrade: Updated golint to version 7.0.0 for better code quality
  • Test infrastructure: Improved test infrastructure and CI/CD pipeline reliability

Monitoring and observability

  • Dashboard updates: Updated Grafana dashboards with improved metrics and visualizations
  • Logging improvements: Enhanced logging capabilities and configuration options
  • Dependency updates: Updated various monitoring and logging dependencies

Configuration changes

Default settings

  • Enhanced enroot configuration options
  • Improved topology configuration defaults
  • Updated GPU hardware specifications
  • Better container runtime configurations

Migration notes

Existing SUNK v6.2.0 deployments will continue to work, but you may want to:

  1. Review enroot configuration if using custom container setups
  2. Test topology configuration changes with your existing Node layouts
  3. Verify GPU hardware configurations match your current hardware
  4. Update any custom container runtime configurations
  5. Test the improved scheduling algorithms with your workloads

Breaking changes

  • Block topology calculation: Changed from min to max block size calculation (may affect Node grouping)
  • Node sorting: Alphabetical Node sorting is now enabled by default (may change Node ordering)

Documentation

For detailed information about configuring and using SUNK v6.3.0, see: