Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

Running jobs and management tasks in the Slurm cluster requires connecting to the Slurm login node. Login nodes are configured on per-user basis and deployed as a sub-chart to the Slurm chart. Pass these configuration values from the values.yaml of the Slurm chart to define how slurm-login integrates and functions.

Manage individual login pods

You can manage individual login pods using the following features and functionality:
  • Enable the deployment of the sub-chart: To enable the slurm-login functionality as part of the Slurm chart, set the slurm-login.enabled parameter to true.
  • Reboot individual login pods: If an individual login pod is out of sync with the underlying StatefulSet, run the reboot command from within the pod. This command deletes and recreates the pod using the updated version. If the pod is out of sync, a Message of the Day (MOTD) will appear on SSH login with instructions to reboot:
    **********************************************************************
    *                                                                    *
    * The login statefulset has been updated, please restart your login  *
    * pod to get the latest changes.                                     *
    *                                                                    *
    * To restart the login pod, issue the command "reboot".              *
    **********************************************************************
    
  • Access individual login pods: For instructions on accessing each individual Slurm login pod and running Slurm jobs, refer to Connect to a Slurm login node.

Manage user identities and provision resources

The slurm-login.directoryCache parameter defines the directory service configuration used for managing user identities and provisioning resources. This parameter includes multiple sub-values, with the key sub-values detailed below.

Select all users from a specified group

slurm-login.directoryCache.selectGroups provides a list of user groups, from which the slurm-login chart will retrieve all associated users. This acts as a filter, meaning only users belonging to any of the specified groups will be included. It uses an OR logic, so a user needs to be in at least one of the listed groups to be selected.

Define the polling interval for detecting changes to users and user groups

slurm-login.directoryCache.interval defines the polling interval for detecting changes to users and user groups. This interval determines how frequently updates are applied, and modifies resources accordingly.

Verify individual Slurm login resources

You can verify created resources using the StatefulSets and Services as shown below.
kubectl get sts
You should see output similar to the following:
NAME                             READY   AGE
slurm-login                      1/1     20d
slurm-login-slurmuser1-7fcd49e   1/1     31d
slurm-login-slurmuser2-a19c153   1/1     31d
slurm-login-slurmuser3-4fe7f1d   1/1     31d
slurm-login-slurmuser4-4f705de   1/1     31d
slurm-login-slurmuser5-5e909b3   1/1     31d
kubectl get svc
You should see output similar to the following:
NAME                             TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
slurm-login                      LoadBalancer   10.96.46.53     <pending>     22:31408/TCP   35d
slurm-login-0                    LoadBalancer   10.96.146.233   <pending>     22:31276/TCP   35d
slurm-login-slurmuser1-7fcd49e   ClusterIP      10.96.61.246    <none>        22/TCP         32d
slurm-login-slurmuser2-a19c153   ClusterIP      10.96.160.110   <none>        22/TCP         32d
slurm-login-slurmuser3-4fe7f1d   ClusterIP      10.96.2.45      <none>        22/TCP         32d
slurm-login-slurmuser4-4f705de   ClusterIP      10.96.203.63    <none>        22/TCP         32d
slurm-login-slurmuser5-5e909b3   ClusterIP      10.96.184.148   <none>        22/TCP         32d
Optionally, to ensure that users only use their designated pods, you can disable the shared Slurm login StatefulSet and slurm-login Service.

List the directory services to be configured

slurm-login.directoryCache.directoryService.directories specifies a list of directory services to be configured. This is similar to the directoryService configuration in Slurm and can be duplicated or referenced using a YAML anchor for reuse.

Example nsscache configuration

See an example of a typical nsscache configuration below:
slurm-login:
  enabled: true
  directoryCache:
    source: nsscache
    selectGroups: ["group1", "group2"]
See manage users with nsscache for instructions on configuring nsscache.

Example SSSD configuration

See an example of a typical SSSD configuration below:
slurm-login:
  enabled: true
  directoryCache:
    source: sssd
    selectGroups: ["group1", "group2"]
    interval: 1m
    directoryService:
      # Google Secure LDAP
      directories:
        - name: google-example.com
          enabled: true
          ldapUri: ldaps://ldap.google.com:636
          user:
            canary: [email protected]
          defaultShell: "/bin/bash"
          fallbackHomeDir: "/home/%u"
          overrideHomeDir: /mnt/nvme/home/%u
          ldapsCert: google-ldaps-cert
          schema: rfc2307bis
See manage users with a directory service for more information about configuring SSSD.

Parameter reference table

ParameterDescription
slurm-login.enabledSet this value to true in the Slurm chart to enable the slurm-login chart.
slurm-login.directoryCacheDefines the directory service configuration used for managing user identities and provisioning resources. The key sub-values include:
slurm-login.directoryCache.selectGroupsProvides a list of user groups that the slurm-login chart will use to retrieve all associated users. This acts as a filter, meaning only users belonging to any of the specified groups will be included. It uses an OR logic, so a user needs to be in at least one of the listed groups to be selected.
slurm-login.directoryCache.intervalDefines the polling interval for detecting changes to users and user groups. This interval determines how frequently updates are applied, and resources are modified accordingly.
slurm-login.directoryCache.directoryService.directoriesSpecifies a list of directory services to be configured. This is similar to the directoryService configuration in Slurm and can be duplicated or referenced using a YAML anchor for reuse.
For extra customizations, refer to the full parameters list.
Last modified on April 20, 2026