Slurm parameter reference - CoreWeave Docs

Requirements

Repository	Name	Version
file://../library	library	0.1.0
file://../slurm-login	slurm-login	0.1.0
oci://registry-1.docker.io/bitnamicharts	mysql	9.19.1

Parameters

Key & Description	Type	Default
accounting.annotations Additional annotations for accounting resources.	object	`{}`
accounting.config.ArchiveEvents	string	`"yes"`
accounting.config.ArchiveJobs	string	`"yes"`
accounting.config.ArchiveResvs	string	`"yes"`
accounting.config.ArchiveSteps	string	`"no"`
accounting.config.ArchiveSuspend	string	`"no"`
accounting.config.ArchiveTXN	string	`"no"`
accounting.config.ArchiveUsage	string	`"no"`
accounting.config.AuthAltParameters[0]	string	`"jwt_key=/etc/jwt/jwt.key"`
accounting.config.AuthAltTypes	string	`"auth/jwt"`
accounting.config.AuthType	string	`"auth/munge"`
accounting.config.DbdPort	int	`6819`
accounting.config.DebugLevel	string	`"verbose"`
accounting.config.LogFile	string	`"/dev/null"`
accounting.config.PidFile	string	`"/var/run/slurmdbd.pid"`
accounting.config.PurgeEventAfter	string	`"1month"`
accounting.config.PurgeJobAfter	string	`"12month"`
accounting.config.PurgeResvAfter	string	`"1month"`
accounting.config.PurgeStepAfter	string	`"1month"`
accounting.config.PurgeSuspendAfter	string	`"1month"`
accounting.config.PurgeTXNAfter	string	`"12month"`
accounting.config.PurgeUsageAfter	string	`"24month"`
accounting.config.SlurmUser	string	`"slurm"`
accounting.config.StoragePort	int	`3306`
accounting.config.StorageType	string	`"accounting_storage/mysql"`
accounting.enabled Enable the accounting.	bool	`true`
accounting.external.enabled Enable the external accounting, instead of deploying an internal accounting instance. This configuration also requires the underlying database for `slurmdbd` to be managed externally.	bool	`false`
accounting.external.host The host of the external accounting instance: IP or hostname.	string	`null`
accounting.external.port The port of the external accounting instance.	string	`null`
accounting.external.user The user to use to authenticate to the external accounting instance.	string	`null`
accounting.externalDB.enabled Configure Slurm Accounting (`slurmdbd`) with an external database.	bool	`false`
accounting.externalDB.existingSecret Specify the name of the Kubernetes Secret that contains the password used by `slurmdbd` to access the Slurm accounting database. This Secret must contain a data key named `db-password` whose value is the actual database password. Important: The password value stored in the Secret cannot contain the hash (`#`) character.	string	`null`
accounting.externalDB.storageHost The hostname of the server where the database resides.	string	`null`
accounting.externalDB.storageLoc The name of the database used to store Slurm accounting records. Defaults to “slurm_acct_db”.	string	`"slurm_acct_db"`
accounting.externalDB.storageUser The username `slurmdbd` uses for authentication and storing job accounting data.	string	`null`
accounting.image The image to use for slurmdbd deployment.	object	`name: controller repository: tag:`
accounting.labels Additional labels for accounting resources.	object	`{}`
accounting.livenessProbe The liveness probe for the slurmdbd container.	object	`exec: command: - sacctmgr - ping initialDelaySeconds: 15 periodSeconds: 10 failureThreshold: 5 successThreshold: 1 timeoutSeconds: 60`
accounting.priorityClassName The priority class name for the accounting pod.	string	`"sunk-control-plane"`
accounting.readinessProbe The readiness probe for the slurmdbd container.	object	`exec: command: - sacctmgr - ping initialDelaySeconds: 15 periodSeconds: 10 failureThreshold: 5 successThreshold: 1 timeoutSeconds: 60`
accounting.replicas The number of replicas of the accounting instance to run.	int	`1`
accounting.resources Resources for the accounting container.	object	`limits: memory: 64Gi requests: cpu: 16 memory: 64Gi`
accounting.securityContext.runAsGroup The group to run as, must match the slurm GID from the container image.	int	`401`
accounting.securityContext.runAsUser The user to run as, must match the slurm UID from the container image.	int	`401`
accounting.startupProbe The startup probe for the slurmdbd container.	object	`null`
accounting.terminationGracePeriodSeconds The termination grace period for the accounting pod.	int	`30`
accounting.useExistingSecret Use an existing secret for the accounting instance instead of creating. The secret name is the same as the mysql.auth.existingSecret.	bool	`false`
accounting.volumeMounts Additional volume mounts to apply to the accounting pod.	list	`[]`
accounting.volumes Additional volumes to mount to the accounting pod.	list	`[]`
cleanupCompleting.annotations Additional annotations for cleanup-completing Job resources.	object	`{}`
cleanupCompleting.cronJobSchedule The schedule for the cleanup-completing CronJob. It should be formatted according to the cron convention. Default runs every minute.	string	`"* * * * *"`
cleanupCompleting.deleteInvalidNodes Enable deletion of nodes that are in INVALID_REG state after downing them. This allows nodes to cleanly re-register with Slurm.	bool	`true`
cleanupCompleting.dryRun Enable dry run mode - shows what would be done without actually downing nodes.	bool	`false`
cleanupCompleting.enabled Enable cleanup of nodes with jobs stuck in COMPLETING state.	bool	`true`
cleanupCompleting.labels Additional labels for cleanup-completing Job resources.	object	`{}`
cleanupCompleting.nodeSelector.affinity The affinity for the cleanup-completing Job. This overrides the value of `global.nodeSelector.affinity`.	object	`null`
cleanupCompleting.priorityClassName The priority class name for the cleanup-completing Job pod.	string	`"sunk-control-plane"`
cleanupCompleting.resources Resources for the cleanup-completing Job container.	object	`limits: memory: 256Mi requests: cpu: 100m memory: 64Mi`
cleanupCompleting.timeoutSeconds Timeout in seconds for jobs in COMPLETING state before downing nodes. Jobs that have been completing longer than this threshold will trigger node downing (if no other jobs are present on the node). If not specified, defaults to 2x KillWait from slurm.conf.	int	`null`
cleanupCompleting.tolerations The tolerations for the cleanup-completing Job	list	`null`
cleanupCompleting.verbose Enable verbose logging for debugging.	bool	`true`
compute.annotations Additional annotations for compute services only. Use `compute.nodes.custom-definition.annotations` to add annotations to specific node definitions instead.	object	`{}`
compute.autoPartition.config The following are intended for the customer to update. These values will be applied to each auto-generated partition. The partition name will be the same as the node definition name. Example: click to expand `config: OverSubscribe: "YES" MaxTime: "12:00:00" QoS: "NORMAL"`	object	`null`
compute.autoPartition.enabled Enable the auto partition.	bool	`true`
compute.cacheDropper.enabled An option to enable or disable the cache-dropper sidecar container across all slurmd pods.	bool	`true`
compute.cacheDropper.resources Resources for the cache-dropper sidecar container.	object	`limits: memory: 32Mi requests: cpu: 500m memory: 32Mi`
compute.epilogConfigMap The name or list of configmap names containing epilog scripts	string \| list	`[]`
compute.externalClusterName The name of an external cluster to join. This is used when control plane is deployed separately.	string	`null`
compute.generateTopology Enable topology generation for the compute nodes in the cluster.	bool	`true`
compute.gpusd Configuration for GPUSD (GPU Straggler Detection) metrics collection.	object	See individual settings below.
compute.gpusd.enabled Enable GPUSD package installation, metrics collection, and VMPodScrape resource.	bool	`false`
compute.gpusd.version GPUSD version to install.	string	`"1.0.0"`
compute.initialState The initial state for the nodes when they join the slurm cluster. This is generally `drain` or `idle`. May also be set per node definition.	string	`"idle"`
compute.initialStateReason The reason for setting the initial state of the nodes to down, drained, or fail. May also be set per node definition.	string	`"Node added to the cluster for the first time"`
compute.labels Additional labels for compute services only. Use `compute.nodes.custom-definition.labels` to add labels to specific node definitions instead.	object	`{}`
compute.livenessProbe The liveness probe for the compute slurmd container.	object	`map[]`
compute.maxUnavailable The maximum unavailability of the compute nodes during a rolling update. Can be percentage or a number.	string	`"10%"`
compute.nodes Multiple node definitions can be declared, but only one may be `enabled: true`. Node definitions can reference other definitions to include or overlay values. See the example below or the Compute Node Definitions documentation for more details. Example: click to expand compute: nodes: # A custom definition to be referenced by other nodes custom-dns: dnsPolicy: "None" dnsConfig: nameservers: - 127.0.0.1 # A simple CPU-only Node that uses the custom-dns definition above simple-cpu: enabled: true # Partition creation for the nodeset. The nodeset will still be assigned to the `all` # partition if creation is disabled createPartition: true # The following Partition configurations are for the customer to update. # These values will be applied to the autogenerated partition and will override the defaults. partitionConfig: OverSubscribe: "YES" MaxTime: "12:00:00" QoS: "NORMAL" replicas: 1 definitions: # Use the custom-dns definition - custom-dns staticFeatures: - cpu dynamicFeatures: node.coreweave.cloud/class: {} image: name: controller-extras gresGpu: null config: weight: 1 # Create a small node with 1cpu and 1g memory resources: limits: memory: 1Gi cpu: 1 requests: memory: 1Gi cpu: 1 tolerations: - key: is_cpu_compute operator: Exists volumeMounts: - name: ramtmp mountPath: /tmp volumes: - name: ramtmp emptyDir: medium: Memory affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/os operator: In values: - linux	object	See Compute Node Definitions.
compute.partitionBaseConfig Default configuration for partitions in the cluster. These values can be overridden per partition in the autoPartition section. Example: click to expand `partitionBaseConfig: MaxTime: "INFINITE" State: "UP"`	object	`{ "MaxTime": "INFINITE", "State": "UP" }`
compute.partitions Partitions to add to the cluster. The key is the partition name and the value is the partition configuration.	object	`all: Nodes=ALL Default=YES`
compute.plugstackConfig Additional plug-in stack configuration items for `plugstack.conf` file config. Config Options: https://slurm.schedmd.com/spank.html#SECTION_CONFIGURATION	list	`[]`
compute.ports Additional ports to expose on the compute nodes. Example: NCCL Plugin ports `ports: - containerPort: 10400 protocol: TCP - containerPort: 10401 protocol: TCP - containerPort: 10402 protocol: TCP - containerPort: 10403 protocol: TCP - containerPort: 10404 protocol: TCP - containerPort: 10405 protocol: TCP - containerPort: 10406 protocol: TCP - containerPort: 10407 protocol: TCP`	list	`[]`
compute.prologConfigMap The name or list of configmap names containing prolog scripts	string \| list	`[]`
compute.pyxis.appArmorProfile The AppArmor profile to use for the pyxis container.	string	`"localhost/enroot"`
compute.pyxis.enabled Enable the pyxis container.	bool	`true`
compute.pyxis.enrootConfig Configuration options for enroot.	object	`ENROOT_RUNTIME_PATH: /run/enroot/user-$(id -u) ENROOT_CACHE_PATH: /opt/sunk/tmp/enroot-cache/user-$(id -u) ENROOT_DATA_PATH: /opt/sunk/tmp/enroot-data/user-$(id -u) # Enables <code>ENROOT_MOUNT_HOME</code> for the pyxis container to mount the home directory. ENROOT_MOUNT_HOME: y # Enables <code>ENROOT_REMAP_ROOT</code> for the pyxis container to remap the root user. ENROOT_REMAP_ROOT: y ENROOT_RESTRICT_DEV: n ENROOT_ROOTFS_WRITABLE: y`
compute.pyxis.plugstackOptions Additional arguments for the pyxis plugin in `plugstack.conf` file config. Config Options: https://github.com/NVIDIA/pyxis/wiki/Setup#slurm-plugstack-configuration	list	`[ "container_scope=global" ]`
compute.pyxis.podSecurityContext Security context for the pyxis container.	object	`{ "seccompProfile": { "localhostProfile": "profiles/enroot", "type": "Localhost" } }`
compute.pyxis.podSecurityContext.seccompProfile The seccomp profile to use for the pyxis container.	object	`{ "localhostProfile": "profiles/enroot", "type": "Localhost" }`
compute.readinessProbe The readiness probe for the compute slurmd container.	object	`exec: command: - scontrol - show - slurmd failureThreshold: 3 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5`
compute.reservedMemory Reserved memory when calculating DefMemPerCPU config for slurm.conf	string	`"4Gi"`
compute.s6 oneshot and longrun jobs are supported. See Running Scripts with S6 for more information. Example: click to expand `s6: packages: type: oneshot timeoutUp: 0 timeoutDown: 0 script: \| #!/usr/bin/env bash apt -y update apt -y install nginx nginx: type: longrun timeoutUp: 0 timeoutDown: 0 script: \| #!/usr/bin/env bash nginx -g "daemon off;"`	object	`{}`
compute.securityContext.capabilities.add Add capabilities to the slurmd container. `“SYS_ADMIN”` is required if using pyxis. Example: click to expand `compute: securityContext: capabilities: add: ["SYS_ADMIN"]`	list	`[ "SYS_NICE", "SYS_ADMIN", "SYS_PTRACE", "SYSLOG" ]`
compute.ssh.enabled Enable ssh to the compute nodes.	bool	`true`
compute.startupProbe The startup probe for the compute slurmd container.	object	`map[]`
compute.volumeMounts Additional volume mounts to add to all the compute pods, also added to login pods.	list	`[]`
compute.volumes Additional volumes to mount to all the compute pods, also added to login pods.	list	`[]`
controlPlane.enabled Enable the Slurm control plane. Unless splitting the deployment this should be enabled.	bool	`true`
controller.annotations Additional annotations for controller resources.	object	`{}`
controller.enabled Enable the controller deployment This should be enabled unless more complicated deployment is required (splitting the deployment).	bool	`true`
controller.etcConfigMap The ConfigMap(s) with keys mapping to files in `/etc/slurm` on the controller only. This ConfigMap must not contain: `slurm.conf` `plugstack.conf` `gres.conf` `cgroup.conf` `topology.conf`	string \| list	`null`
controller.image The image to use for the controller.	object	`name: controller repository: tag:`
controller.labels Additional labels for controller resources.	object	`{}`
controller.livenessProbe The liveness probe for the controller.	object	`exec: command: - scontrol - ping failureThreshold: 6 initialDelaySeconds: 15 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 60`
controller.priorityClassName The priority class name for the controller.	string	`"sunk-control-plane"`
controller.readinessProbe The readiness probe for the controller.	object	`map[]`
controller.replicas The number of replicas of the controller to run, currently should be left at `1`.	int	`1`
controller.resources Resources for the controller container.	object	`limits: memory: 64Gi requests: cpu: 16 memory: 64Gi`
controller.securityContext.runAsGroup The group to run as, must match the slurm GID from the container image.	int	`401`
controller.securityContext.runAsUser The user to run as, must match the slurm UID from the container image.	int	`401`
controller.startupProbe The startup probe for the controller.	object	`map[]`
controller.stateVolume.size The size of the persistent volume claim.	string	`"32Gi"`
controller.stateVolume.storageClassName The storage class name to use for the volume.	string	`"shared-vast"`
controller.terminationGracePeriodSeconds The termination grace period for the controller.	int	`30`
controller.volumeMounts Additional volume mounts to apply to the controller pod.	list	`[]`
controller.volumes Additional volumes to mount to the controller pod.	list	`[]`
controller.watch.enabled Enable watching the Slurm configuration and triggering a reconfigure when there are changes.	bool	`true`
controller.watch.interval The interval in seconds to check for changes in the Slurm configuration.	int	`60`
controller.watch.livenessProbe The liveness probe for the watch container.	object	`null`
controller.watch.readinessProbe The readiness probe for the watch container.	object	`null`
controller.watch.startupProbe The startup probe for the watch container.	object	`null`
directoryService.debugLevel A bit mask of what SSSD debug levels to enable.	int	`0x01F0`
directoryService.directories The directory services to configure. Click to expand examples. Google Secure LDAP `directories: - name: google-example.com enabled: true ldapUri: ldaps://ldap.google.com:636 user: canary: user@google-example.com defaultShell: "/bin/bash" fallbackHomeDir: "/home/%u" overrideHomeDir: /mnt/nvme/home/%u ldapsCert: google-ldaps-cert schema: rfc2307bis` CoreWeave LDAP `directories: - name: coreweave.cloud enabled: true ldapUri: ldap://openldap user: bindDn: cn=admin,dc=coreweave,dc=cloud searchBase: dc=coreweave,dc=cloud existingSecret: bind-user-sssd-config canary: admin defaultShell: "/bin/bash" fallbackHomeDir: "/home/%u" schema: rfc2307` Authentik `directories: - name: coreweave.cloud enabled: true ldapUri: ldap://authentik-outpost-ldap-outpost user: bindDn: cn=ldapsvc,dc=coreweave,dc=cloud searchBase: dc=coreweave,dc=cloud existingSecret: bind-user-sssd-config canary: ldapsvc startTLS: true userObjectClass: user groupObjectClass: group userNameAttr: cn groupNameAttr: cn schema: rfc2307bis` Active Directory `directories: - name: contoso.com enabled: true ldapUri: ldap://domaincontroller.tenant-my-tenant.coreweave.cloud user: bindDn: CN=binduser,CN=Users,DC=contoso,DC=com searchBase: DC=contoso,DC=com existingSecret: bind-user-sssd-config canary: binduser defaultShell: "/bin/bash" fallbackHomeDir: "/home/%u" schema: AD`	list
directoryService.directories[0].additionalConfig Multi-line string of additional arbitrary config per domain for sssd. Example: click to expand `additionalConfig: \| ldap_foo = bar`	string	`null`
directoryService.directories[0].defaultShell The default user shell.	string	`"/bin/bash"`
directoryService.directories[0].enabled Enable the directory service.	bool	`false`
directoryService.directories[0].fallbackHomeDir The fallback user home directory.	string	`"/home/%u"`
directoryService.directories[0].ignoreGroupMembers This overrides SSSD configuration of the same name If set to `true`, SSSD only retrieves information about the group objects themselves and not their members, providing a significant performance boost. If omitted, defaults to `true`.	bool	`null`
directoryService.directories[0].ldapUri The LDAP URI to use for the directory service. Example: `ldap://YOUR_LDAP_IP` For Google Secure LDAP, use: `ldaps://ldap.google.com:636`	string	`null`
directoryService.directories[0].ldapsCert Name of existing TLS certificate for LDAP-S. Example: click to expand `kubectl create secret tls google-ldaps-cert \ --cert=Google_2025_08_24_55726.crt \ --key=Google_2025_08_24_55726.key`	string	`null`
directoryService.directories[0].name Name of the directory service. The primary domain should always be named: `default`	string	`"default"`
directoryService.directories[0].overrideGidAttr Override the default schema LDAP attribute that corresponds to the user’s primary group id. Example: `posixGid`	string	`null`
directoryService.directories[0].overrideHomeDir Override the homeDirectory attribute from LDAP with a provided path. Example: `/mnt/nvme/home/%u`	string	`null`
directoryService.directories[0].overrideUidAttr Override the default schema LDAP attribute that corresponds to the user’s id. Example: `posixUid`	string	`null`
directoryService.directories[0].overrideUserNameAttr Override the default schema LDAP attribute that corresponds to the user’s login name. Example: `employeeNumber`	string	`null`
directoryService.directories[0].schema The desired LDAP schema for the directory service. Valid values: `AD` `POSIX` `rfc2307bis` Note: For Google Secure LDAP, use `rfc2307bis`.	string	`"AD"`
directoryService.directories[0].user.bindDn The LDAP bind DN to use for the directory service. Where bindDn is not required (e.g. Google Secure LDAP), only supply `user.canary`. Example: `cn=Admin,ou=Users,ou=CORP,dc=corp,dc=example,dc=com`	string	`null`
directoryService.directories[0].user.canary The username to lookup to confirm LDAP is working.	string	`null`
directoryService.directories[0].user.existingSecret Name of an existing secret containing an SSSD configuration snippet with the `ldap_default_authtok` set for this domain.	string	`null`
directoryService.directories[0].user.existingSecretFileName The name of the file in the existing secret that contains the ldap passwords.	string	`"ldap-password.conf"`
directoryService.directories[0].user.groupSearchBase The LDAP group search base to use for the directory service. Example: `ou=groups,dc=example,dc=com`	string	`null`
directoryService.directories[0].user.password The password to use for the directory service lookups.	string	`null`
directoryService.directories[0].user.searchBase The LDAP search base to use for the directory service. Example: `dc=corp,dc=example,dc=com`	string	`null`
directoryService.negativeCacheTimeout Negative caching value (in seconds). Determines how long an invalid entry will be cached before asking LDAP again. This improves the directory listing time when a primary gid cannot be found.	string	`"600"`
directoryService.sudoGroups List of Unix groups from all directories with sudo privileges. Group names are fully-qualified for additional directories. Group names are not fully-qualified for the default directory; (e.g. “group1” instead of “group1@domain.com”)	list	`[]`
directoryService.watchInterval The interval in seconds to check for changes in sssd configuration.	int	`60`
global.annotations Additional annotations to apply to all resources.	object	`{}`
global.cks Enable CoreWeave Kubernetes Services (CKS) integration.	bool	`true`
global.dnsConfig.additionalSearches A list of namespaces to add to the list of DNS searches. These additional searches extend hostname lookup in the control-plane, compute, and login pods. Default dns searches: - name-compute.namespace.svc.cluster.local - slurm_cluster_name-controller.namespace.svc.cluster.local	list	`[]`
global.imagePullPolicy The image pull policy for all containers.	string	`"IfNotPresent"`
global.labels Additional labels to apply to the all resources.	object	`{}`
global.nodeSelector.affinity The affinity for the Slurm control-plane components.	object	`nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node.coreweave.cloud/class operator: In values: - cpu`
global.volumeMounts The list of volume mounts to apply to all compute, controller, accounting, and login pods	list	`[]`
global.volumes The list of volumes to mount to all compute, controller, accounting, and login pods	list	`[]`
imagePullSecrets The list of secrets used to access images in a private registry.	list	`[]`
jwt.existingSecret The name of an existing secret containing the JWT private key, otherwise the chart will generate one.	string	`null`
login.annotations Additional annotations.	object	`{}`
login.automountServiceAccountToken Automatically mount the service account token into the login pod.	bool	`false`
login.containers Additional sidecar containers to add to the login pod.	list	`[]`
login.enabled Enable the login nodes	bool	`true`
login.env Additional environment variables to pass to the sshd container.	list	`[]`
login.hostAliases Provides Pod-level override of hostname resolution when DNS and other options are not applicable in login pods. See Adding entries to Pod /etc/hosts with HostAliases for more information.	list	`[]`
login.image The image to use for the login node.	object	`name: controller-extras repository: tag:`
login.individualResources Resources for the slurm-login pod sshd container.	object	`limits: memory: 2Gi requests: cpu: 500m memory: 300Mi`
login.labels Additional labels.	object	`null`
login.nodeSelector.affinity The affinity for the login nodes. This overrides the value of `global.nodeSelector.affinity`.	object	`null`
login.priorityClassName The priority class name for the login pod.	string	`null`
login.replicas The number of replicas of the login node. When running more than one, a pod specific-service is created for each one in addition to the main service.	int	`1`
login.resources.limits.memory	string	`"8Gi"`
login.resources.requests.cpu	int	`4`
login.resources.requests.memory	string	`"8Gi"`
login.s6 oneshot and longrun jobs are supported. See Running Scripts with S6 for more information. Example: click to expand `s6: packages: type: oneshot timeoutUp: 0 timeoutDown: 0 script: \| #!/usr/bin/env bash apt -y update apt -y install nginx nginx: type: longrun timeoutUp: 0 timeoutDown: 0 script: \| #!/usr/bin/env bash nginx -g "daemon off;"`	object	`{}`
login.service.additionalPorts Additional port definitions to expose. Example: click to expand `additionalPorts: - name: eternal-shell port: 2022 targetPort: 20222 # optional protocol: TCP # optional`	list	`[]`
login.service.enabled Enable the creation of service(s) for login pods.	bool	`true`
login.service.externalTrafficPolicy The external traffic policy.	string	`"Local"`
login.service.loadBalancerClass The load balancer class to use for the login services.	string	`null`
login.service.metadata.0.annotations Additional annotations to apply to the first login service (0).	object	`{}`
login.service.metadata.0.labels Additional labels to apply to the common first login service (0).	object	`{}`
login.service.metadata.common.annotations Additional annotations to apply to the common login service.	object	`{}`
login.service.metadata.common.labels Additional labels to apply to the common login service.	object	`{}`
login.service.metadata.global.annotations Additional annotations to apply to all login services.	object	`null`
login.service.metadata.global.labels Additional labels to apply to all login services.	object	`{}`
login.service.type The type of service to create. This defaults to `LoadBalancer` for cloud deployments. For development and test systems without an external load balancer to handle the service routing, such as when deploying on kind (Kubernetes IN Docker), this may be set to `ClusterIP`.	string	`"LoadBalancer"`
login.serviceAccountName The service account name to use for the login pod.	string	`"default"`
login.sshKeyVolume.accessModes The access mode for the storage. If scaling login beyond 1 replica, this must be `ReadWriteMany`. In a development setting with a volume provider that doesn’t support `ReadWriteMany`, such as kind (Kubernetes IN Docker), this may be set to `ReadWriteOnce`.	string	`[ "ReadWriteMany" ]`
login.sshKeyVolume.enabled Enable the ssh key volume, to allow keys to be mounted and persisted in the login pod. If this is disabled the host keys for the login pod will be regenerated on each container restart.	bool	`true`
login.sshKeyVolume.size The size of the persistent volume claim.	string	`"1Gi"`
login.sshKeyVolume.storageClassName The storage class name to use for the volume.	string	`"shared-vast"`
login.sshdConfig Additional sshd configuration to add to the login pod. Example: click to expand `sshdConfig: PasswordAuthentication: "no"`	string	`null`
login.sshdLivenessProbe.config The liveness probe for the login sshd container.	object	`failureThreshold: 10 initialDelaySeconds: 10 periodSeconds: 5 tcpSocket: port: 22`
login.sshdLivenessProbe.enabled If the liveness probe for the login sshd is enabled	bool	`false`
login.sshdReadinessProbe.config The readiness probe for the login sshd container.	object	`map[]`
login.sshdReadinessProbe.enabled If the readiness probe for the login sshd container is enabled	bool	`false`
login.sshdStartupProbe.config The startup probe for the login sshd container.	object	`map[]`
login.sshdStartupProbe.enabled If the startup probe for the login sshd container is enabled	bool	`false`
login.terminationGracePeriodSeconds The termination grace period for the login pod.	int	`30`
login.updateStrategy The update strategy for the login node- Default is type is RollingUpdate	object	`{}`
login.volumeMounts Additional volume mounts to apply to the login pod.	list	`[]`
login.volumes Additional volumes to add to the login pod. Example: click to expand `volumes: - name: cache-vol emptyDir: medium: Memory`	list	`[]`
moco Options for MOCO MySQL used for Slurm job accounting.	object	See individual settings below.
moco.enabled Enable moco.	bool	`false`
moco.migration.enabled When enabled: true, a Kubernetes Job is created to perform the migration of the Slurm accounting database to MOCO MySQL. This job runs once and then completes. Any existing Slurm accounting database in bitnami MySQL database will be migrated to the MOCO MySQL database. This should be set to true only for the initial migration, and then set to false afterwards to bring the cluster back to normal operation. During this automated migration, the Slurm cluster will not be in a functional state.	bool	`false`
moco.mysqlCluster.affinity Optional pod affinity configuration	object	`map[]`
moco.mysqlCluster.auth.existingSecret Optional, will use randomly generated moco `WRITABLE_PASSWORD` if not set Specify the name of the Kubernetes Secret that contains the password used by `slurmdbd` to access the MOCO MySQL database. This Secret must contain a data key named `WRITABLE_PASSWORD` whose value is the actual database password.	string	`null`
moco.mysqlCluster.auth.storageHost The hostname of the server where the database resides.	string	`"moco-{{ .Release.Name }}"`
moco.mysqlCluster.auth.storageLoc The name of the database used to store Slurm accounting records. Defaults to “slurm_acct_db”.	string	`"slurm_acct_db"`
moco.mysqlCluster.auth.storageUser The username `slurmdbd` uses for authentication and storing job accounting data.	string	`"moco-writable"`
moco.mysqlCluster.config Additional MySQL configuration to add to the mysqlCluster. This will be placed in a ConfigMap and referenced by the mysqlCluster. The contents of the configmap will be rendered as a template, so helm expressions can be used. This needs to render as valid yaml. MySQL option file documentation Example: click to expand `config: \| max_connections: 1000 wait_timeout: 28800`	string	`null`
moco.mysqlCluster.image The image to use for mysql.	object	`repository: ghcr.io/cybozu-go/moco/mysql tag: 8.4.6`
moco.mysqlCluster.inodeLockFixer.enabled Configure init container to fix inode locking issues. This init container copies, moves, and replaces the MySQL data directory to prevent known inode locking issues that can occur in certain storage environments.	bool	`true`
moco.mysqlCluster.inodeLockFixer.image The image to use for the inode lock fixer init container.	object	`repository: alpine tag: 3.20.0`
moco.mysqlCluster.persistence The volume settings to use for mysql.	object	`storageClassName: size: 128Gi accessModes: - ReadWriteOnce`
moco.mysqlCluster.resources Resources for the moco mysqld container.	object	`requests: memory: 64Gi cpu: "16" limits: memory: 64Gi`
moco.priorityClassName The priority class name for moco.	string	`"sunk-control-plane"`
munge.args The additional arguments to pass to the munge container. The defaults run Munge with 10 threads instead of 2.	list	`[ "--num-threads", "10" ]`
munge.livenessProbe Liveness probe for the munge container. When munged hangs (e.g. thread deadlock), `munge -n` blocks on the Unix socket and the probe times out, causing Kubernetes to restart only the munged container.	object	`exec: command: ["munge", "-n"] initialDelaySeconds: 60 periodSeconds: 180 timeoutSeconds: 180 failureThreshold: 5`
munge.readinessProbe The readiness probe for the munge container.	object	`map[]`
munge.resources Resources for the munge container.	object	`limits: memory: 2Gi requests: cpu: 1 memory: 2Gi`
munge.securityContext.runAsGroup The group to run as, must match the munge GID from the container image.	int	`400`
munge.securityContext.runAsUser The user to run as, must match the munge UID from the container image.	int	`400`
munge.startupProbe The startup probe for the munge container.	object	`map[]`
mysql Options for Bitnami MySQL chart, uses Bitnami default values. There is an added option here: `vmPodScrape.enabled` which can be used as an alternative to `serviceMonitor.enabled`.	object	See Bitnami default values.
nsscache.annotations Additional annotations for nsscache Job resources.	object	`{}`
nsscache.cronJobSchedule The schedule for the nsscache update CronJob. It should be formatted according to the cron convention.	string	`"* * * * *"`
nsscache.enabled Enable nsscache.	bool	`true`
nsscache.existingSecret Name of an existing secret containing the LDAP password for this domain. This secret should contain a key named `nsscache-ldap-password` which contains the password to use for the LDAP bind DN. For SCIM, this secret should contain a key named `nsscache-scim-auth-token` which contains the token to use for the SCIM server.	string	`null`
nsscache.labels Additional labels for nsscache Job resources.	object	`{}`
nsscache.nodeSelector.affinity The affinity for the nsscache Job. This overrides the value of `global.nodeSelector.affinity`.	object	`null`
nsscache.nsscacheConfig Options for defining nsscache.conf. Click to exapand examples. LDAP `nsscacheConfig: default: source: ldap ldap_uri: ldap://authentik-outpost-ldap-outpost ldap_base: dc=coreweave,dc=cloud ldap_bind_dn: cn=ldapsvc,dc=coreweave,dc=cloud ldap_bind_password: ldap_rfc2307bis: 1 ldap_default_shell: /bin/bash ldap_scope: sub ldap_uidattr: cn passwd: ldap_filter: (objectClass=user) ldap_override_home_dir: /mnt/home/%%u group: ldap_filter: (objectClass=group) shadow: ldap_filter: (objectClass=user) sshkey: ldap_filter: (objectClass=user)` SCIM `nsscacheConfig: default: source: scim scim_base_url: https://api.coreweave.com/scim/abc123 scim_users_parameters: filter=active eq "true"&groups=slurm-users,slurm-admins scim_groups_parameters: excludeInactiveUsers=true&includeVirtualUserGroups=slurm-users,slurm-admins`	object	See the nsscache.conf documentation.
nsscache.nsscacheConfig.default.cache Specifying the means in which the cache data will be stored.	string	`"files"`
nsscache.nsscacheConfig.default.files_cache_filename_suffix A suffix appended to the cache filename to differentiate it from, say, system NSS databases.	string	`"cache"`
nsscache.nsscacheConfig.default.files_dir Directory location to store the plain text files in.	string	`"/etc/nsscache"`
nsscache.nsscacheConfig.default.ldap_base The base to perform LDAP searches under. Example: `dc=coreweave,dc=cloud`	string	`null`
nsscache.nsscacheConfig.default.ldap_bind_dn The bind DN to use when connecting to LDAP. Empty string is an anonymous bind. Example: `cn=ldapsvc,dc=coreweave,dc=cloud`	string	`null`
nsscache.nsscacheConfig.default.ldap_bind_password The password to use for the LDAP bind DN. We strongly recommend using a Kubernetes secret to store this password and reference it using the `nsscache.existingSecret` value.	string	`null`
nsscache.nsscacheConfig.default.ldap_default_shell This will be the default shell for all users. You can specify a different shell by setting the `loginShell` value in the user attributes in the source directory configuration. Example: `/bin/bash`	string	`null`
nsscache.nsscacheConfig.default.ldap_rfc2307bis Example: `1`	int	`null`
nsscache.nsscacheConfig.default.ldap_scope The search scope to use for LDAP. Example: `sub`	string	`null`
nsscache.nsscacheConfig.default.ldap_uidattr The uid-like attribute in your LDAP directory. Example: `cn`	string	`null`
nsscache.nsscacheConfig.default.ldap_uri The LDAP URI to connect to. Example: `ldap://authentik-outpost-ldap-outpost`	string	`null`
nsscache.nsscacheConfig.default.maps The recommended defaults below are useful for standard nsscache operation in many environments.	list	`[ "passwd", "shadow", "group", "sshkey" ]`
nsscache.nsscacheConfig.default.scim_base_url The base URL for the SCIM server. Example: `https://api.coreweave.com/scim/<org>`	string	`null`
nsscache.nsscacheConfig.default.scim_groups_endpoint The endpoint for the SCIM groups API.	string	`"CoreWeaveGroups"`
nsscache.nsscacheConfig.default.scim_groups_parameters Option to use url parameters for groups endpoint. Special characters (spaces, quotes, etc.) will be automatically URL encoded. There is a custom parameter for creating virtual user groups that is a comma separated list. It will create an entry in the groups map for the user’s gid for the members of the selected group(s). This parameter typically should match any group filtering in scim_users_parameters. Including a filter for inactive users by default. Example: `excludeInactiveUsers=true&includeVirtualUserGroups=slurm-users,slurm-admins`	string	`"excludeInactiveUsers=true"`
nsscache.nsscacheConfig.default.scim_users_endpoint The endpoint for the SCIM users API.	string	`"Users"`
nsscache.nsscacheConfig.default.scim_users_parameters Option to use url parameters for users endpoint. Special characters (spaces, quotes, etc.) will be automatically URL encoded. There is a custom parameter for filtering by groups that is a comma separated list. Including a filter for inactive users by default. Example: `filter=active eq “true”&groups=slurm-users,slurm-admins`	string	`filter=active eq “true”`
nsscache.nsscacheConfig.default.source Specify the data source to use. Supported options are `scim` and `ldap`.	string	`"scim"`
nsscache.nsscacheConfig.default.timestamp_dir Specifying the location of the timestamps used for incremental updates.	string	`"/var/lib/nsscache"`
nsscache.nsscacheConfig.group.scim_path_gid The SCIM path for the GID attribute.	string	`"sunkPosixGroupId"`
nsscache.nsscacheConfig.group.scim_path_groupname The SCIM path for the group name attribute. Used when the SCIM server provides a custom field for group names. If not specified or the path returns no value, nsscache will fall back to using displayName, name, or id from the SCIM group resource.	string	`"sunkPosixGroupName"`
nsscache.nsscacheConfig.group.scim_path_username The SCIM path for the GID attribute.	string	`"members/sunkPosixUsername"`
nsscache.nsscacheConfig.passwd.ldap_filter The search filter to use when querying. Example: `(objectClass=user)`	string	`null`
nsscache.nsscacheConfig.passwd.ldap_override_home_dir This will override the home directory all users. %%u will be replaced with the username. this should match the mount found in compute.VolumeMounts Example: `/mtn/home/%%u`	string	`null`
nsscache.nsscacheConfig.passwd.scim_default_shell This will be the default shell for all users.	string	`"/bin/bash"`
nsscache.nsscacheConfig.passwd.scim_override_home_directory This will override the home directory all users. %%u will be replaced with the username. this should match the mount found in compute.VolumeMounts Example: `/mnt/home/%%u`	string	`"/mnt/home/%%u"`
nsscache.nsscacheConfig.passwd.scim_path_gid The SCIM path for the GID attribute.	string	`"urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixGroupId"`
nsscache.nsscacheConfig.passwd.scim_path_home_directory The SCIM path for the home directory attribute.	string	`"urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPreferredHomeDirectory"`
nsscache.nsscacheConfig.passwd.scim_path_login_shell The SCIM path for the login shell attribute.	string	`"urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkLoginShell"`
nsscache.nsscacheConfig.passwd.scim_path_uid The SCIM path for the UID attribute.	string	`"urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUserId"`
nsscache.nsscacheConfig.passwd.scim_path_username The SCIM path for the username attribute.	string	`"urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUsername"`
nsscache.nsscacheConfig.shadow.ldap_filter The search filter to use when querying. Example: `(objectClass=user)`	string	`null`
nsscache.nsscacheConfig.shadow.scim_path_username The SCIM path for the username attribute.	string	`"urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUsername"`
nsscache.nsscacheConfig.sshkey.ldap_filter The search filter to use when querying. Example: `(objectClass=user)`	string	`null`
nsscache.nsscacheConfig.sshkey.scim_path_ssh_keys The SCIM path for the SSH keys attribute.	string	`"urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkSshKeys"`
nsscache.nsscacheConfig.sshkey.scim_path_username The SCIM path for the username attribute.	string	`"urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUsername"`
nsscache.nsswitchConfig Options for defining nsswitch.conf.	object	See the nsswitch.conf documentation.
nsscache.nsswitchConfig.aliases Mail aliases, used by getaliasent(3) and related functions.	list	`[]`
nsscache.nsswitchConfig.ethers Ethernet numbers.	list	`[]`
nsscache.nsswitchConfig.group Groups of users, used by getgrent(3) and related functions.	list	`[ "files", "cache" ]`
nsscache.nsswitchConfig.hosts Host names and numbers, used by gethostbyname(3) and related functions.	list	`[]`
nsscache.nsswitchConfig.initgroups Supplementary group access list, used by getgrouplist(3) function.	list	`[]`
nsscache.nsswitchConfig.netgroup Network-wide list of hosts and users, used for access rules. C libraries before glibc 2.1 supported netgroups only over NIS.	list	`[]`
nsscache.nsswitchConfig.networks Network names and numbers, used by getnetent(3) and related functions.	list	`[]`
nsscache.nsswitchConfig.passwd User passwords, used by getpwent(3) and related functions.	list	`[ "files", "cache" ]`
nsscache.nsswitchConfig.protocols Network protocols, used by getprotoent(3) and related functions.	list	`[]`
nsscache.nsswitchConfig.publickey Public and secret keys for Secure_RPC used by NFS and NIS+.	list	`[]`
nsscache.nsswitchConfig.rpc Remote procedure call names and numbers, used by getrpcbyname(3) and related functions.	list	`[]`
nsscache.nsswitchConfig.services Network services, used by getservent(3) and related functions.	list	`[]`
nsscache.nsswitchConfig.shadow Shadow user passwords, used by getspnam(3) and related functions.	list	`[]`
nsscache.priorityClassName The priority class name for the nsscache Job pod.	string	`"sunk-control-plane"`
nsscache.resources Resources for the nsscache Job container.	object	`limits: memory: 500Mi requests: cpu: 200m memory: 100Mi`
nsscache.slurmUserProvisioning.defaultSlurmAccount The default Slurm account for automated provisioning of users.	string	`"cw-sup"`
nsscache.slurmUserProvisioning.dryRun Enable dry run mode - shows what would be done without making changes.	bool	`false`
nsscache.slurmUserProvisioning.enabled Enable slurmUserProvisioning.	bool	`true`
nsscache.slurmUserProvisioning.interval The interval in seconds between user sync runs.	int	`60`
nsscache.sudoGroups List of Unix groups with sudo privileges.	list	`[]`
nsscache.tolerations The tolerations for the nsscache Job	list	`null`
rest.annotations Additional annotations for REST API resources.	object	`{}`
rest.args The additional arguments to pass to the rest container. Defaults enable debug logging and only load most recent openAPI plugins.	list	`[ "-vv", "-sslurmdbd,slurmctld", "-dv0.0.40" ]`
rest.containers Additional sidecar containers to add to the restd pod.	list	`[]`
rest.enabled Enable the REST API deployment This is optional and should be disabled for most use cases.	bool	`false`
rest.env The additional environment variables to pass to the rest container.	list	`[ { "name": "SLURMRESTD_JSON", "value": "compact" } ]`
rest.image The image to use for the REST API deployment.	object	`name: controller repository: tag:`
rest.labels Additional labels for REST API resources.	object	`{}`
rest.livenessProbe The liveness probe for the rest container.	object	`tcpSocket: port: slurmrestd failureThreshold: 2 periodSeconds: 10`
rest.priorityClassName The priority class name for the rest pod.	string	`null`
rest.readinessProbe The readiness probe for the rest container.	object	`tcpSocket: port: slurmrestd periodSeconds: 5 failureThreshold: 1`
rest.replicas The number of replicas of the rest pod to run. In most production environments this should be set to a minimum of 2 to provide HA.	int	`1`
rest.resources Resources for the slurmrestd container. These defaults are appropriate for small and medium-sized clusters.	object	`limits: memory: 64Gi requests: cpu: 2 memory: 8Gi`
rest.securityContext.runAsGroup The group to run as, GID must exist in the container image.	int	`65534`
rest.securityContext.runAsUser The user to run as, UID must exist in the container image.	int	`65534`
rest.service.additionalPorts Additional port definitions to expose. Example: click to expand `additionalPorts: - name: proxy port: 8080 targetPort: 8080 # optional protocol: TCP # optional`	list	`[]`
rest.service.annotations Additional annotations to apply to rest service.	object	`{}`
rest.service.clusterIP	string	`"None"`
rest.service.enabled Enable the creation of service for rest pods.	bool	`true`
rest.service.externalTrafficPolicy The external traffic policy.	string	`null`
rest.service.labels Additional labels to apply rest service.	object	`{}`
rest.service.loadBalancerClass The load balancer class to use for the rest services.	string	`null`
rest.service.type The type of service to create. This defaults to `ClusterIP`.	string	`"ClusterIP"`
rest.startupProbe The startup probe for the rest container.	object	`tcpSocket: port: slurmrestd failureThreshold: 20 periodSeconds: 2`
rest.terminationGracePeriodSeconds The termination grace period for the rest pod.	int	`5`
rest.volumeMounts Additional volume mounts to apply to the rest pod.	list	`[]`
rest.volumes Additional volumes to add to the restd pod. Example: click to expand `volumes: - name: cache-vol emptyDir: medium: Memory`	list	`[]`
scheduler.annotations Additional annotations for scheduler resources.	object	`{}`
scheduler.config.scheduler.gpuTypes Mapping of k8s gpu types to Slurm gpu types. The keys represent GPU types required during scheduling from the node affinity using the key “gpu.nvidia.com/class” and the values represent the gres gpu type in Slurm. This gets added to a job’s description.	map	`{ "A100_NVLINK_80GB": "a100", "H100_NVLINK_80GB": "h100" }`
scheduler.config.scheduler.pollInterval The polling interval for the Slurm API.	string	`"10s"`
scheduler.config.scheduler.terminationOffset offset termination grace period to account for communication delays etc.	string	`"5s"`
scheduler.config.slurm.poolSize The number of connections to be maintained in the connection pool.	int	`10`
scheduler.config.slurm.protocolVersion The protocol version to use for communication with the Slurm controller.	string	`"25_05"`
scheduler.config.slurm.usePersistentConnection Use Slurm’s persistent connections for connection reuse.	bool	`true`
scheduler.controllerAddress The address of the Slurm controller to connect to. This should be the service address of the controller in host:port format.	string	`""`
scheduler.enabled Enable the scheduler. To schedule k8s pods on the Slurm cluster nodes, this must be enabled.	bool	`false`
scheduler.hooksAPI config for the webhooks.	object	`{ "waitForPodDeletionInterval": "1s" }`
scheduler.hooksAPI.waitForPodDeletionInterval The polling interval when checking for pod deletion.	string	`"1s"`
scheduler.image The image to use for the scheduler.	object	`repository: registry.gitlab.com/coreweave/sunk/operator tag:`
scheduler.labels Additional labels for scheduler resources.	object	`{}`
scheduler.livenessProbe The liveness probe for the scheduler container.	object	`httpGet: path: /healthz port: 8081 initialDelaySeconds: 15 periodSeconds: 20`
scheduler.logLevel The log level. Uses integers or zap log level strings: `debug` `info` `warn` `error` `dpanic` `panic` `fatal`	string	`"info"`
scheduler.maxConcurrentReconciles The maximum concurrent reconciles. This should be adjusted based on the volume of pods using the scheduler to handle bursts operations quickly. The size of both the Slurm and Kubernetes clusters will impact this but less than the syncer. The driving factor here tends to be the pod volume and associated Slurm jobs more than anything else. Using the same value as the syncer should be a rather conservative starting point in many use cases.	int	`50`
scheduler.name The name of the scheduler used to select the scheduler during pod creation. By default the name is based on the namespace and release name `<namespace>-<release>-scheduler` when not set.	string	`null`
scheduler.priorityClassName The priority class name for the scheduler pod.	string	`"sunk-control-plane"`
scheduler.readinessProbe The readiness probe for the scheduler container.	object	`httpGet: path: /readyz port: 8081 initialDelaySeconds: 5 periodSeconds: 10`
scheduler.resources Resources for the scheduler container.	object	`limits: memory: 24Gi cpu: 16 requests: cpu: 4 memory: 24Gi`
scheduler.scope.namespaces The list of the namespaces to scope the scheduler to. Only used when `scope.type` is set to `namespace`. Namespaces other than the release namespace will need role bindings created.	list	`[.Release.Namespace]`
scheduler.scope.type The type can be `cluster` or `namespace`.	string	`"namespace"`
scheduler.startupProbe The startup probe for the scheduler container.	object	`map[]`
secretJob.annotations Additional annotations for secret Job resources.	object	`{}`
secretJob.labels Additional labels for secret Job resources.	object	`{}`
secretJob.nodeSelector.affinity The affinity for the secret job. This overrides the value of `global.nodeSelector.affinity`.	object	`null`
secretJob.priorityClassName The priority class name for the secret job pod.	string	`"sunk-control-plane"`
secretJob.resources Resources for the secret job container.	object	`limits: memory: 500Mi requests: cpu: 200m memory: 100Mi`
secretJob.tolerations The tolerations for the secret job	list	`null`
slurm-login Configure individual login nodes via `slurm-login` subchart. Below is an example showing some of the key parameters of the subchart, see subchart docs for all parameters. Example: click to expand slurm-login: enable: true directoryCache: # select users from two groups selectGroups: ["slum-researches", "slurm-ops"] # poll every minutes (default 90s) interval: 1m # enable nsscache source source: nsscache SSSD options: directoryService is not needed when using nsscache, use only with SSSD. # Google Secure LDAP directoryService: directories: - name: google-example.com enabled: true ldapUri: ldaps://ldap.google.com:636 user: canary: user@google-example.com defaultShell: "/bin/bash" fallbackHomeDir: "/home/%u" overrideHomeDir: /mnt/nvme/home/%u ldapsCert: google-ldaps-cert schema: rfc2307bis	object	See default values in `slurm-login` subchart.
slurmConfig.AccountingStorageEnforce Controls what level of association-based enforcement to impose on job submissions. Multiple values allowed. Valid options are any combination of: `associations` `limits` `nojobs` `nosteps` `qos` `safe` `wckeys` Use `all` to impose everything except `nojobs` and `nosteps`, which must be requested separately. See the Slurm documentation for more details.	list	`[ "qos", "limits" ]`
slurmConfig.AccountingStorageTRES	string	`"gres/gpu"`
slurmConfig.AccountingStorageType	string	`"accounting_storage/slurmdbd"`
slurmConfig.AuthAltParameters[0]	string	`"jwt_key=/etc/jwt/jwt.key"`
slurmConfig.AuthAltTypes[0]	string	`"auth/jwt"`
slurmConfig.BatchStartTimeout	int	`120`
slurmConfig.CommunicationParameters The list of communication parameters to pass to slurmCtld. See the Slurm documentation for possible values.	list	`- KeepAliveTime=60 - keepaliveinterval=10 - keepaliveprobes=3`
slurmConfig.DebugFlags Comma-separated debug flags for logging.	string	`"NO_CONF_HASH"`
slurmConfig.DefMemPerCPU The default memory per CPU in megabytes. Sets the slurm.conf parameter of the default real memory size available per usable allocated CPU in megabytes. This value is used when the `—mem-per-cpu` option is not specified on the `srun` command line.	int	`4096`
slurmConfig.Epilog	string	`"/usr/share/sunk/bin/epilog.sh"`
slurmConfig.GresTypes[0]	string	`"gpu"`
slurmConfig.InactiveLimit Terminate job allocation commands, such as `srun` or `salloc`, that are unresponsive longer than this interval in seconds. See the slurm.conf reference for more details.	int	`0`
slurmConfig.JobAcctGatherFrequency	int	`30`
slurmConfig.JobAcctGatherType	string	`"jobacct_gather/cgroup"`
slurmConfig.JobCompType	string	`"jobcomp/none"`
slurmConfig.JobSubmitPlugins The job submit plugins to use.	list	`[]`
slurmConfig.KillWait The interval in seconds between the SIGTERM and SIGKILL signals given to a job’s processes upon reaching its time limit. See the slurm.conf reference for more details.	int	`30`
slurmConfig.MaxNodeCount	int	`3072`
slurmConfig.MessageTimeout	int	`100`
slurmConfig.MinJobAge	int	`300`
slurmConfig.MpiDefault	string	`"pmix"`
slurmConfig.ProctrackType The plugin to be used for process tracking on a job step basis. See the Slurm documentation for more details. Valid values: `proctrack/linuxproc` `proctrack/cgroup`	string	`"proctrack/cgroup"`
slurmConfig.Prolog	string	`"/usr/share/sunk/bin/prolog.sh"`
slurmConfig.PrologFlags[0]	string	`"Alloc"`
slurmConfig.PrologFlags[1]	string	`"Serial"`
slurmConfig.RebootProgram	string	`"/usr/share/sunk/bin/reboot.sh"`
slurmConfig.ReturnToService	int	`2`
slurmConfig.SUNKJobDashboardURL	string	`null`
slurmConfig.SUNKNodeDashboardURL	string	`null`
slurmConfig.SchedulerParameters[0]	string	`"nohold_on_prolog_fail"`
slurmConfig.SchedulerParameters[1]	string	`"max_rpc_cnt=256"`
slurmConfig.SchedulerType	string	`"sched/backfill"`
slurmConfig.SelectType	string	`"select/cons_tres"`
slurmConfig.SelectTypeParameters The values to use for the parameters of the select/cons_tres plugin. Allowed values depend on the configured value of `SelectType`. See the slurm.conf reference for more details.	string	`"CR_CPU_MEMORY"`
slurmConfig.SlurmSchedLogFile	string	`"/dev/null"`
slurmConfig.SlurmSchedLogLevel	int	`1`
slurmConfig.SlurmUser	string	`"slurm"`
slurmConfig.SlurmctldDebug	string	`"verbose"`
slurmConfig.SlurmctldLogFile	string	`"/dev/null"`
slurmConfig.SlurmctldParameters The list of additional parameters to pass to slurmCtld. See the Slurm documentation for possible values.	list	`- idle_on_node_suspend - node_reg_mem_percent=95`
slurmConfig.SlurmctldPidFile	string	`"/var/run/slurmctld.pid"`
slurmConfig.SlurmctldPort	int	`6817`
slurmConfig.SlurmctldTimeout The interval, in seconds, that the backup controller waits for the primary controller to respond before assuming control. The default value is 120 seconds. May not exceed 65533.	int	`60`
slurmConfig.SlurmdDebug	string	`"verbose"`
slurmConfig.SlurmdLogFile	string	`"/proc/1/fd/1"`
slurmConfig.SlurmdPidFile	string	`"/var/run/slurmd.pid"`
slurmConfig.SlurmdPort	int	`6818`
slurmConfig.SlurmdSpoolDir	string	`"/var/spool/slurmd"`
slurmConfig.SlurmdTimeout The interval, in seconds, that the Slurm controller waits for slurmd to respond before configuring that node’s state to DOWN.	int	`60`
slurmConfig.StateSaveLocation	string	`"/var/spool/slurmctld/save"`
slurmConfig.SuspendTime Nodes which remain idle or down for this number of seconds will be placed into power save mode by SuspendProgram.	string	`"INFINITE"`
slurmConfig.SwitchType	string	`"switch/none"`
slurmConfig.TCPTimeout	int	`15`
slurmConfig.TaskPlugin The task plugin to use. See the Slurm documentation for more details. Multiple comma-separated values allowed. Valid values: `task/affinity` `task/cgroup` `task/none`	string	`"task/cgroup,task/affinity"`
slurmConfig.TaskPluginParam Optional parameters for the task plugin. See the Slurm documentation for more details.	string	`"SlurmdSpecOverride"`
slurmConfig.TopologyParam[0]	string	`"TopoOptional"`
slurmConfig.TopologyPlugin	string	`"topology/tree"`
slurmConfig.TreeWidth	int	`65533`
slurmConfig.UnkillableStepProgram	string	`"/usr/share/sunk/bin/unkillable-step.sh"`
slurmConfig.UnkillableStepTimeout	int	`900`
slurmConfig.WaitTime Specifies how many seconds the `srun` command should wait after the first task terminates before terminating all remaining tasks. Using the `—wait` option on the `srun` command line overrides this value. The default value is `0`, which disables this feature. See the slurm.conf reference for more details.	int	`0`
slurmConfig.cgroupConfig The `cgroup.conf` value. This is only used when `ProctrackType` is set to `proctrack/cgroup`. Note: `cgroup/v2` should be used over `autodetect` on systems using cgroup v2.	object	`CgroupPlugin: autodetect IgnoreSystemd: yes ConstrainCores: yes ConstrainDevices: yes ConstrainRAMSpace: yes`
sssdContainer.enabled Enable the sssd sidecar container.	bool	`false`
sssdContainer.livenessProbe The liveness probe for the sssd container.	object	`map[]`
sssdContainer.readinessProbe The readiness probe for the sssd container.	object	`map[]`
sssdContainer.startupProbe The startup probe for the sssd container.	object	`map[]`
syncer.annotations Additional annotations for syncer resources.	object	`{}`
syncer.config.slurm.poolSize The number of connections to be maintained in the connection pool.	int	`10`
syncer.config.slurm.protocolVersion The protocol version to use for communication with the Slurm controller.	string	`"25_05"`
syncer.config.slurm.usePersistentConnection Use Slurm’s persistent connections for connection reuse.	bool	`true`
syncer.config.syncer.nodesetUpdateJobPreemption Configuration for job preemption support. More details can be found in the changelog	object	`{ "enabled": false, "method": null }`
syncer.config.syncer.nodesetUpdateJobPreemption.enabled Enable job preemption support.	bool	`false`
syncer.config.syncer.nodesetUpdateJobPreemption.method Job preemption strategy during rolling upgrades, can be set to one of the following methods: `partition` Preempt jobs in specific partitions. A comma-separated list of partition names can be specified in `partitions`. `qos` Preempt jobs based on their QoS. A comma-separated list of QoS names can be specified in `qos`. `time` Preempt jobs if the time since the rolling delete condition has been on the pod, is greater than the set time limit. Time limit in seconds can be set in `timeLimit`.	string	`null`
syncer.config.syncer.orphanedPodDelay The delay to wait before deleting a pod that is no longer associated with a Slurm node.	string	`"120s"`
syncer.config.syncer.pollInterval The polling interval for the Slurm API.	string	`"10s"`
syncer.config.syncer.qosInterruptable The externally defined label to indicate if pod is interruptable.	string	`"qos.coreweave.cloud/interruptable"`
syncer.config.syncer.reconfigureRateLimit The rate limit, in seconds, for Slurm reconfigure requests based on additions to NodeSlices. The value must be above 0 seconds to enable this feature. Warning: if this value is too low, `scontrol reconfigure` may be executed too often, especially during periods when several nodes are newly added.	string	`"3600s"`
syncer.config.syncer.slurmNodeCleanUp Removes lingering Slurm nodes from the cluster after they have been removed from their associated SUNK NodeSets.	bool	`true`
syncer.controllerAddress The address of the Slurm controller to connect to. This should be the service address of the controller in host:port format.	string	`""`
syncer.enabled Enable the syncer. This is required for most functionality and should only be disabled for troubleshooting.	bool	`true`
syncer.hooksAPI config for the webhooks.	object	`{ "nodeRebootCondition": "PhaseState", "nodeRebootReason": "production-powerreset", "safeNodeRebootCondition": "PendingPhaseState", "safeNodeRebootReason": "production-powerreset", "waitForNodeLockedInterval": "1s", "waitForNodeLockedTimeout": "120s" }`
syncer.hooksAPI.nodeRebootCondition Condition to indicate node should be rebooted.	string	`"PhaseState"`
syncer.hooksAPI.nodeRebootReason The target NLCC lifecycle state associated with the `nodeRebootCondition`.	string	`"production-powerreset"`
syncer.hooksAPI.safeNodeRebootCondition Condition to indicate node should be rebooted safely.	string	`"PendingPhaseState"`
syncer.hooksAPI.safeNodeRebootReason The target NLCC lifecycle state associated with the `safeNodeRebootCondition`.	string	`"production-powerreset"`
syncer.hooksAPI.waitForNodeLockedInterval The polling interval when checking for node locked state.	string	`"1s"`
syncer.hooksAPI.waitForNodeLockedTimeout The timeout for checking node locked state.	string	`"120s"`
syncer.image The image to use for the syncer.	object	`repository: registry.gitlab.com/coreweave/sunk/operator tag:`
syncer.labels Additional labels for syncer resources.	object	`{}`
syncer.livenessProbe The liveness probe for the syncer container.	object	`httpGet: path: /healthz port: 8081 initialDelaySeconds: 15 periodSeconds: 20`
syncer.logLevel The log level. Uses integers or zap log level strings: `debug` `info` `warn` `error` `dpanic` `panic` `fatal`	string	`"info"`
syncer.maxConcurrentReconciles The maximum concurrent reconciles. This should be adjusted based on the number of nodes and size of jobs launched in the Slurm cluster, to handle bursts operations quickly. A value 1/10th the number of nodes in the cluster is a good starting point for small clusters. As cluster size increases, this value can be a smaller fraction of the total number of nodes in most cases. For instance a value of 50 seems to handle a 2000 node cluster well. Being too aggressive here will bottleneck on other components such as the Kubernetes API server and the Slurm controller, which in some cases may cause errors.	int	`50`
syncer.nodePermissions.enabled Enable node operations on the syncer, currently this allows restart of nodes when enabled.	bool	`true`
syncer.priorityClassName The priority class name for the syncer pod.	string	`"sunk-control-plane"`
syncer.readinessProbe The readiness probe for the syncer container.	object	`httpGet: path: /readyz port: 8081 initialDelaySeconds: 5 periodSeconds: 10`
syncer.resources Resources for the syncer container.	object	`limits: memory: 24Gi cpu: 16 requests: cpu: 4 memory: 24Gi`
syncer.startupProbe The startup probe for the syncer container.	object	`map[]`
syncer.watchAllNodeSets Watch all NodeSets in the namespace. This overrides default behavior of only watching the NodeSets deployed with this chart release.	bool	`false`
syncer.watchNodeSets The list of NodeSets to watch. This overrides the default behavior of watching the NodeSets deployed with this chart release to instead watch this specific list. This is not used if watchAllNodeSets is set to true.	list	`[]`
userLookupContainer.livenessProbe The liveness probe for the user-lookup container.	object	`map[]`
userLookupContainer.readinessProbe The readiness probe for the user-lookup container.	object	`map[]`
userLookupContainer.startupProbe The startup probe for the user-lookup container.	object	`map[]`

SUNK

Documentation Index

​Requirements

​Parameters

Requirements

Parameters