Skip to main content

Manage users in SUNK

Enable automatic user provisioning for SUNK with nsscache

SUNK User Provisioning (SUP) uses nsscache, a lightweight directory service, to manage users across CoreWeave clusters. It supports two directory protocols: SCIM and LDAP. We recommend SCIM, as it enables automated user and group synchronization from your upstream identity provider (IdP) to SUNK. When a user is added or removed in your IdP, the change is reflected in your SUNK cluster within minutes, ensuring access stays accurate, secure, and up-to-date.

This implementation replaces the previous directory service based on SSSD, securing access and improving reliability. Whether your users are managed in a third-party IdP or directly in CoreWeave IAM, SUP automatically provisions, updates, and removes POSIX users in your cluster without manual intervention.

This guide demonstrates how to configure nsscache to support SUP with the following steps:

  1. Set up SUNK User Provisioning (SUP).
  2. Create user groups with CoreWeave IAM or your upstream IdP.
  3. Create a Kubernetes Secret and add it to the slurm chart.

Prerequisites

Before making any changes to the nsscache configuration, you must first do the following:

  • Set up SUP. SUP is required to provision cluster access to users, whether you are using a federated IdP or not.
  • Create user groups with CoreWeave IAM or your upstream IdP. SUP provisions access to groups of users, rather than individual users. These groups must be created before configuring nsscache, and the group names must exactly match the group names specified in the nsscache configuration.

Configure a Kubernetes Secret for SUP

You will need to create a Kubernetes Secret that contains the configuration for your directory service, then add the Secret to the slurm chart's values.yaml file, as detailed below.

Create a Kubernetes Secret

Create a Kubernetes Secret that contains the configuration for your directory service.

Warning

CoreWeave strongly discourages including sensitive information, such as plaintext user credentials, directly in your values.yaml configuration. Instead, use a Kubernetes Secret for added security and manageability. CoreWeave provides encryption at rest for etcd data, which includes Secrets.

This secret can be a SCIM token or an LDAP service account password. Be sure to follow the appropriate naming conventions detailed below when creating your Secret.

Naming conventions for SCIM Secrets

For a SCIM Secret, the suggested Secret name is <release-name>-nsscache-scim-secret.

Inside the Secret, the key under data must be named nsscache-scim-auth-token, as shown in the example below. This contains the base64-encoded SCIM token.

Example
apiVersion: v1
kind: Secret
metadata:
name: scim-auth-token
data:
nsscache-scim-auth-token: <base64-encoded-scim-token>

Naming conventions for LDAP Secrets

For an LDAP Secret, the suggested Secret name is <release-name>-nsscache-ldap-secret.

Inside the Secret, the key under data must be named nsscache-ldap-password. This contains the base64-encoded password for the LDAP service.

Update the slurm chart with your Kubernetes Secret

After creating the Secret, you will need to update the slurm chart's values.yaml file.

Edit the nsscache.existingSecret parameter with the name of your Secret, as shown in the examples below.

Provision cluster access to groups of users

SUP provisions cluster access to groups of users, rather than individual users. You must create the groups in CoreWeave IAM or your upstream IdP before configuring nsscache. The names of your created user groups must exactly match the group names specified in the nsscache configuration.

Example SCIM values
sssdContainer:
enabled: false
# You may remove the `directoryService` key when using nsscache
directoryService:
nsscache:
enabled: true
existingSecret: scim-auth-token
sudoGroups:
- slurm-admins
nsscacheConfig:
default:
source: scim
scim_base_url: https://api.coreweave.com/scim/<org>
scim_users_parameters: filter=active eq "true"&groups=slurm-users,slurm-admins
scim_groups_parameters: excludeInactiveUsers=true&includeVirtualUserGroups=slurm-users,slurm-admins

The suggested nsscache configuration for SCIM is set by default in the values.yaml file. For a full list of configuration options, see the SCIM parameter reference.

The following parameters are used to configure the groups that will be provisioned:

  • sudoGroups specifies the user groups that can run sudo commands in the nodes.
  • scim_users_endpoint specifies the SCIM endpoint path for retrieving user data. The default value is Users.
  • scim_groups_endpoint specifies the SCIM endpoint path for retrieving group data. The default value is Groups.
  • scim_users_parameters specifies that the users in these groups will be provisioned, meaning they will find their User ID in the cluster. By default, this will filter out inactive users with filter=active eq "true".
  • scim_groups_parameters specifies the groups to be provisioned. By default, this will filter out inactive users with excludeInactiveUsers=true. If no user groups are specified, all groups will be provisioned.

Filter specific user groups

scim_groups_parameters and scim_users_parameters allow optional parameters to be added to the groups and users endpoints, respectively. Special characters (spaces, quotes, etc.) will be automatically URL encoded.

To provision access to specific user groups, list the group names in both the scim_users_parameters and scim_groups_parameters parameters, as follows:

Example
scim_users_parameters: filter=active eq "true"&groups=slurm-users,slurm-admins
scim_groups_parameters: excludeInactiveUsers=true&includeVirtualUserGroups=slurm-users,slurm-admins

The above example will do the following:

  • Filter out inactive users
  • Provision access to the users in the slurm-users and slurm-admins groups.
  • Provision the virtual user groups slurm-users and slurm-admins.

You must specify the group names exactly as they appear in your IdP, and list them in both the scim_users_parameters and scim_groups_parameters parameters to ensure that the users and groups are provisioned correctly.

Example: Authentik LDAP values

Example Authentik LDAP values
sssdContainer:
enabled: false
# Remove the `directoryService` key
directoryService:
nsscache:
enabled: true
existingSecret: nsscache-ldap-secret
nsscacheConfig:
default:
source: ldap
ldap_uri: ldap://authentik-outpost-ldap-outpost
ldap_base: dc=coreweave,dc=cloud
ldap_bind_dn: cn=ldapsvc,dc=coreweave,dc=cloud
ldap_bind_password:
ldap_rfc2307bis: 1
ldap_default_shell: /bin/bash
passwd:
ldap_filter: (objectClass=user)
ldap_override_home_dir: /mnt/home/%%u
group:
ldap_filter: (objectClass=group)
shadow:
ldap_filter: (objectClass=user)
sshkey:
ldap_filter: (objectClass=user)

Verify and troubleshoot nsscache

You do not need to manually sync data with nsscache. It takes about two minutes for data to sync from an identity provider to SUNK through nsscache. If you encounter an issue with user data not being available, wait a few minutes and check again.

To validate that nsscache is working,

  1. Log in to the Login pod for your cluster.

    Example
    $
    kubectl exec -it <LOGIN-POD> -c sshd -- /bin/bash
  2. Access the /etc/nsscache directory and list the files within:

    Example
    $
    cd /etc/nsscache && ls

    This should return the following files:

    Example
    group.cache passwd.cache shadow.cache sshkey.cache
  3. Check the contents of the group.cache, passwd.cache, shadow.cache, and sshkey.cache files directly for information about the users and groups in your directory, as shown with the cat command below:

    Example
    $
    cat sshkey.cache

    Alternatively, use the getent command with the group, passwd, or shadow options to list the system users and groups in addition to the data pulled by nsscache, as shown below:

    Example
    $
    getent passwd

    Note that the getent command does not directly retrieve SSH keys. To view the contents of the sshkey.cache file, you will need to use the cat command, as demonstrated above.

Migrate to nsscache from SSSD

Note

In SUNK v7.0.0, nsscache is now the default directory service. The following steps only apply if you are using an earlier version of SUNK and are migrating to nsscache for your directory service.

Disable SSSD in the slurm and slurm-login charts

To enable nsscache, you must edit the slurm Helm chart. If using individual login pods, you must also edit the slurm-login Helm chart.

Update the slurm chart

In the slurm chart's values.yaml file, update the sssdContainer.enabled parameter to false and remove the directoryService section, as shown below:

Example
sssdContainer:
enabled: false
# Remove the `directoryService` key
directoryService:

This disables SSSD, which is incompatible with nsscache.

You may remove the directoryCache.directoryService section in the values.yaml of the slurm chart, as nsscache does not use this configuration.

Disable SSSD

You must disable SSSD to use nsscache. For SUNK versions v6.x and below, SSSD is enabled by default.

Update the slurm-login chart

If using individual login pods, you will also need to edit the slurm-login chart's values.yaml file.

Set the directoryCache.source parameter to nsscache, as shown below:

Example
directoryCache:
source: nsscache
# Remove the `directoryService` key
directoryService:

You may remove the directoryCache.directoryService section in the values.yaml of the slurm-login chart, as nsscache does not use this configuration.