Skip to main content

July 12, 2025

SUNK v6.6.0 release notes

New features

AreaUpdate
Identity & AccessSCIM provisioning for SUNK is now available via nsscache. This enables automated, standards-based user and group management from your IdP to CoreWeave clusters. See SCIM setup.
Job MonitoringSlurm job and node outputs now include direct links to their corresponding Grafana dashboards, giving operators one-click visibility into live job metrics.
Instance TypesAdded two new compute definitions:
rtxp6000-8x: NVIDIA RTX Pro 6000 Blackwell Server Edition
gb300-4x: NVIDIA GB300
Observability• Slurm metrics now carry the slurm_cluster label, simplifying multi-cluster dashboards.
• MySQL exporter metrics are automatically scraped and ingested.
• Enhanced segment-calc script to respect partition filters and job exclusivity, making block-scheduling heat-maps more accurate.
BenchmarkingNCCL-test base image updated to nccl-tests/d5a135d, ensuring compatibility with the latest CUDA toolchain.

Improvements

  • Nodes that stay "busy" inside a reservation are automatically re-evaluated after 30 minutes, reducing orphaned allocations.
  • CoreWeave IAM is now fully integrated with the Slurm Helm chart.
  • Optional SSSD mounts are intelligently gated, reducing unnecessary container overhead.

Fixes

  • Disabled NVIDIA device-plugin health checks that could cause false node drains.
  • Segment-calc now skips nodes already in DRAIN state to prevent skewed capacity charts.
  • PodMonitor and VMPodScrape templates now use consistent relabeling syntax.
  • Removed the InfiniBand requirement for A100-based nodes where it is not present.
  • Multiple operator dependencies updated (chi v5, viper v2, Go Slurm) to incorporate upstream security and stability patches.

Upgrade notes

SCIM setup for SUNK

Suggested SCIM settings are found in the Slurm chart's values-cw.yaml at nsscache.nsscacheConfig.

To set up SCIM provisioning for SUNK, provide your SCIM auth token in a Kubernetes Secret. This token is used to authenticate with your IdP.

  • In the Kubernetes Secret, set the value of the nsscache-scim-auth-token key to your Token.
  • Set nsscache.existingSecret in the values-cw.yaml file of the slurm chart to the name of the Secret.
  • Set nsscache.nsscacheConfig.default.base_url in the values-cw.yaml file of the slurm chart to the base URL of your SCIM server, such as https://api.coreweave.com/scim/<org>.

NSSCache configuration

SCIM provisioning uses the nsscache component. When nsscache is enabled, it's advised to disable SSSD by adjusting the following settings:

  • Set sssdContainer.enabled: false in the values.yaml of the slurm chart.
  • Set directoryCache.source: nsscache in the values.yaml of the slurm-login chart.

NVIDIA health checks

NVIDIA's device-plugin health reporting now defaults to false. If you rely on NVIDIA's device-plugin health reporting, re-enable these checks by setting device-plugin.healthCheck: true.