SUNK v6.7.0 has been released with CUDA 12.9 support, enhanced SCIM andDocumentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
nsscache functionality, HDF5 plugin support, and various bug fixes for improved system stability and performance.
Overview
CUDA 12.9 support
CUDA 12.9 is now supported in the compute node definition.SCIM and nsscache updates
-
SCIM filtering: New configuration options are now available for SCIM user and group filtering, via
nsscache.nsscacheConfig.default.scim_users_parameters. -
Shadow map for SCIM:
nsscachefor SCIM will now create the shadow map by default. -
Home directory override:
NSSCachenow has the ability to override the home directory. This can be set with the following settings based on your authentication method:- For SCIM:
nsscache.nsscacheConfig.passwd.scim_override_home_directory - For LDAP:
nsscache.nsscacheConfig.passwd.ldap_override_home_dir - Learn more about SUNK User Provisioning and its use of SCIM
- For SCIM:
CronJob scheduling
The default schedule for thensscache CronJob has been changed to run every minute (* * * * *).
HDF5 plugin support
Support for the HDF5 plugin is now included, allowing for advanced data handling capabilities.Bug fixes
-
Directory Service Integration: Enable
sudoGroupsmounts whennsscacheis configured. -
Correct Gres Type for NVIDIA RTX PRO 6000: Fixed the
grestype in thertxp6000-8xcompute definition to ensure proper GPU detection. - Slurm Fixes for TaskProlog: Backporting of Slurm fixes to address errors when using TaskProlog.
- Node Locking Improvements: Continue polling for pod information within the context timeout to prevent errors with node locking.
-
Switch to
bitnamilegacy: Switched tobitnamilegacyimages for the following resources as per bitnami recommendations:
ImpactDuring upgrades, there may be a moment where MySQL enters an Error state. While the MySQL pod is down, other components that depend on it - such as the Slurm accounting and controller pods - may experience issues coming up. Depending on how long it takes to come back up, it could also trigger a cascading crash. This should resolve once the
bitnamilegacy image is pulled. If that process is fast enough, these issues will not be visible.