Slurm parameter reference

Requirements

Repository	Name	Version
file://../library	library	0.1.0
file://../slurm-login	slurm-login	0.1.0
oci://registry-1.docker.io/bitnamicharts	mysql	9.19.1

Parameters

Key & Description	Type	Default
accounting.annotations Additional annotations for accounting resources.	object	{}
accounting.config.ArchiveEvents	string	"yes"
accounting.config.ArchiveJobs	string	"yes"
accounting.config.ArchiveResvs	string	"yes"
accounting.config.ArchiveSteps	string	"no"
accounting.config.ArchiveSuspend	string	"no"
accounting.config.ArchiveTXN	string	"no"
accounting.config.ArchiveUsage	string	"no"
accounting.config.AuthAltParameters[0]	string	"jwt_key=/etc/jwt/jwt.key"
accounting.config.AuthAltTypes	string	"auth/jwt"
accounting.config.AuthType	string	"auth/munge"
accounting.config.DbdPort	int	6819
accounting.config.DebugLevel	string	"verbose"
accounting.config.LogFile	string	"/dev/null"
accounting.config.PidFile	string	"/var/run/slurmdbd.pid"
accounting.config.PurgeEventAfter	string	"1month"
accounting.config.PurgeJobAfter	string	"12month"
accounting.config.PurgeResvAfter	string	"1month"
accounting.config.PurgeStepAfter	string	"1month"
accounting.config.PurgeSuspendAfter	string	"1month"
accounting.config.PurgeTXNAfter	string	"12month"
accounting.config.PurgeUsageAfter	string	"24month"
accounting.config.SlurmUser	string	"slurm"
accounting.config.StoragePort	int	3306
accounting.config.StorageType	string	"accounting_storage/mysql"
accounting.enabled Enable the accounting.	bool	true
accounting.external.enabled Enable the external accounting, instead of deploying an internal accounting instance. This configuration also requires the underlying database for `slurmdbd` to be managed externally.	bool	false
accounting.external.host The host of the external accounting instance: IP or hostname.	string	null
accounting.external.port The port of the external accounting instance.	string	null
accounting.external.user The user to use to authenticate to the external accounting instance.	string	null
accounting.externalDB.enabled Configure Slurm Accounting (`slurmdbd`) with an external database.	bool	false
accounting.externalDB.existingSecret Specify the name of the Kubernetes Secret that contains the password used by `slurmdbd` to access the Slurm accounting database. This Secret must contain a data key named `db-password` whose value is the actual database password. Important: The password value stored in the Secret cannot contain the hash (`#`) character.	string	null
accounting.externalDB.storageHost The hostname of the server where the database resides.	string	null
accounting.externalDB.storageLoc The name of the database used to store Slurm accounting records. Defaults to "slurm_acct_db".	string	"slurm_acct_db"
accounting.externalDB.storageUser The username `slurmdbd` uses for authentication and storing job accounting data.	string	null
accounting.image The image to use for slurmdbd deployment.	object	repository: docker.artifacts.coreweave.com/slurm-containers-public/controller tag:
accounting.labels Additional labels for accounting resources.	object	{}
accounting.livenessProbe The liveness probe for the slurmdbd container.	object	exec: command: - sacctmgr - ping initialDelaySeconds: 15 periodSeconds: 10 failureThreshold: 5 successThreshold: 1
accounting.priorityClassName The priority class name for the accounting pod.	string	"sunk-control-plane"
accounting.readinessProbe The readiness probe for the slurmdbd container.	object	exec: command: - sacctmgr - ping initialDelaySeconds: 15 periodSeconds: 10 failureThreshold: 5 successThreshold: 1
accounting.replicas The number of replicas of the accounting instance to run.	int	1
accounting.resources Resources for the accounting container.	object	limits: memory: 64Gi requests: cpu: 16 memory: 64Gi
accounting.securityContext.runAsGroup The group to run as, must match the slurm GID from the container image.	int	401
accounting.securityContext.runAsUser The user to run as, must match the slurm UID from the container image.	int	401
accounting.startupProbe The startup probe for the slurmdbd container.	object	null
accounting.terminationGracePeriodSeconds The termination grace period for the accounting pod.	int	30
accounting.useExistingSecret Use an existing secret for the accounting instance instead of creating. The secret name is the same as the mysql.auth.existingSecret.	bool	false
accounting.volumeMounts Additional volume mounts to apply to the accounting pod.	list	[]
accounting.volumes Additional volumes to mount to the accounting pod.	list	[]
cleanupCompleting.annotations Additional annotations for cleanup-completing Job resources.	object	{}
cleanupCompleting.cronJobSchedule The schedule for the cleanup-completing CronJob. It should be formatted according to the cron convention. Default runs every minute.	string	"* * * * *"
cleanupCompleting.deleteInvalidNodes Enable deletion of nodes that are in INVALID_REG state after downing them. This allows nodes to cleanly re-register with Slurm.	bool	true
cleanupCompleting.dryRun Enable dry run mode - shows what would be done without actually downing nodes.	bool	false
cleanupCompleting.enabled Enable cleanup of nodes with jobs stuck in COMPLETING state.	bool	true
cleanupCompleting.labels Additional labels for cleanup-completing Job resources.	object	{}
cleanupCompleting.nodeSelector.affinity The affinity for the cleanup-completing Job. This overrides the value of `global.nodeSelector.affinity`.	object	null
cleanupCompleting.priorityClassName The priority class name for the cleanup-completing Job pod.	string	"sunk-control-plane"
cleanupCompleting.resources Resources for the cleanup-completing Job container.	object	limits: memory: 256Mi requests: cpu: 100m memory: 64Mi
cleanupCompleting.timeoutSeconds Timeout in seconds for jobs in COMPLETING state before downing nodes. Jobs that have been completing longer than this threshold will trigger node downing (if no other jobs are present on the node). If not specified, defaults to 2x KillWait from slurm.conf.	int	null
cleanupCompleting.tolerations The tolerations for the cleanup-completing Job	list	null
cleanupCompleting.verbose Enable verbose logging for debugging.	bool	true
compute.annotations Additional annotations for compute services only. Use `compute.nodes.custom-definition.annotations` to add annotations to specific node definitions instead.	object	{}
compute.autoPartition.enabled Enable the auto partition.	bool	true
compute.cacheDropper.enabled An option to enable or disable the cache-dropper sidecar container across all slurmd pods.	bool	true
compute.cacheDropper.resources Resources for the cache-dropper sidecar container.	object	limits: memory: 32Mi requests: cpu: 500m memory: 32Mi
compute.epilogConfigMap The name or list of configmap names containing epilog scripts	string \| list	[]
compute.externalClusterName The name of an external cluster to join. This is used when control plane is deployed separately.	string	null
compute.generateTopology Enable topology generation for the compute nodes in the cluster.	bool	true
compute.initialState The initial state for the nodes when they join the slurm cluster. This is generally `drain` or `idle`. May also be set per node definition.	string	"idle"
compute.initialStateReason The reason for setting the initial state of the nodes to down, drained, or fail. May also be set per node definition.	string	"Node added to the cluster for the first time"
compute.labels Additional labels for compute services only. Use `compute.nodes.custom-definition.labels` to add labels to specific node definitions instead.	object	{}
compute.livenessProbe The liveness probe for the compute slurmd container.	object	map[]
compute.maxUnavailable The maximum unavailability of the compute nodes during a rolling update. Can be percentage or a number.	string	"10%"
compute.nodes Multiple node definitions can be declared, but only one may be `enabled: true`. Node definitions can reference other definitions to include or overlay values. See the example below or the Compute Node Definitions documentation for more details. Example: click to expand Example compute: nodes: # A custom definition to be referenced by other nodes custom-dns: dnsPolicy: "None" dnsConfig: nameservers: - 127.0.0.1 # A simple CPU-only Node that uses the custom-dns definition above simple-cpu: enabled: true replicas: 1 definitions: # Use the custom-dns definition - custom-dns staticFeatures: - cpu dynamicFeatures: node.coreweave.cloud/class: {} image: repository: docker.artifacts.coreweave.com/slurm-containers-public/controller-extras gresGpu: null config: weight: 1 # Create a small node with 1cpu and 1g memory resources: limits: memory: 1Gi cpu: 1 requests: memory: 1Gi cpu: 1 tolerations: - key: is_cpu_compute operator: Exists volumeMounts: - name: ramtmp mountPath: /tmp volumes: - name: ramtmp emptyDir: medium: Memory affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/os operator: In values: - linux	object	See Compute Node Definitions.
compute.partitions Partitions to add to the cluster. The key is the partition name and the value is the partition configuration.	object	all: Nodes=ALL Default=YES MaxTime=INFINITE State=UP
compute.plugstackConfig Additional plug-in stack configuration items for `plugstack.conf` file config. Config Options: https://slurm.schedmd.com/spank.html#SECTION_CONFIGURATION	list	[]
compute.ports Additional ports to expose on the compute nodes. Example: NCCL Plugin ports Example ports: - containerPort: 10400 protocol: TCP - containerPort: 10401 protocol: TCP - containerPort: 10402 protocol: TCP - containerPort: 10403 protocol: TCP - containerPort: 10404 protocol: TCP - containerPort: 10405 protocol: TCP - containerPort: 10406 protocol: TCP - containerPort: 10407 protocol: TCP	list	[]
compute.prologConfigMap The name or list of configmap names containing prolog scripts	string \| list	[]
compute.pyxis.appArmorProfile The AppArmor profile to use for the pyxis container.	string	"localhost/enroot"
compute.pyxis.enabled Enable the pyxis container.	bool	true
compute.pyxis.enrootConfig Configuration options for enroot.	object	ENROOT_RUNTIME_PATH: /run/enroot/user-$(id -u) ENROOT_CACHE_PATH: /opt/sunk/tmp/enroot-cache/user-$(id -u) ENROOT_DATA_PATH: /opt/sunk/tmp/enroot-data/user-$(id -u) # Enables <code>ENROOT_MOUNT_HOME</code> for the pyxis container to mount the home directory. ENROOT_MOUNT_HOME: y # Enables <code>ENROOT_REMAP_ROOT</code> for the pyxis container to remap the root user. ENROOT_REMAP_ROOT: y ENROOT_RESTRICT_DEV: n ENROOT_ROOTFS_WRITABLE: y
compute.pyxis.plugstackOptions Additional arguments for the pyxis plugin in `plugstack.conf` file config. Config Options: https://github.com/NVIDIA/pyxis/wiki/Setup#slurm-plugstack-configuration	list	[ "container_scope=global" ]
compute.pyxis.podSecurityContext Security context for the pyxis container.	object	{ "seccompProfile": { "localhostProfile": "profiles/enroot", "type": "Localhost" } }
compute.pyxis.podSecurityContext.seccompProfile The seccomp profile to use for the pyxis container.	object	{ "localhostProfile": "profiles/enroot", "type": "Localhost" }
compute.readinessProbe The readiness probe for the compute slurmd container.	object	exec: command: - scontrol - show - slurmd failureThreshold: 3 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5
compute.s6 oneshot and longrun jobs are supported. See Running Scripts with S6 for more information. Example: click to expand Example s6: packages: type: oneshot timeoutUp: 0 timeoutDown: 0 script: \| #!/usr/bin/env bash apt -y update apt -y install nginx nginx: type: longrun timeoutUp: 0 timeoutDown: 0 script: \| #!/usr/bin/env bash nginx -g "daemon off;"	object	{}
compute.securityContext.capabilities.add Add capabilities to the slurmd container. `"SYS_ADMIN"` is required if using pyxis. Example: click to expand Example compute: securityContext: capabilities: add: ["SYS_ADMIN"]	list	[ "SYS_NICE", "SYS_ADMIN", "SYS_PTRACE", "SYSLOG" ]
compute.ssh.enabled Enable ssh to the compute nodes.	bool	true
compute.startupProbe The startup probe for the compute slurmd container.	object	map[]
compute.volumeMounts Additional volume mounts to add to all the compute pods, also added to login pods.	list	[]
compute.volumes Additional volumes to mount to all the compute pods, also added to login pods.	list	[]
controlPlane.enabled Enable the Slurm control plane. Unless splitting the deployment this should be enabled.	bool	true
controller.annotations Additional annotations for controller resources.	object	{}
controller.enabled Enable the controller deployment This should be enabled unless more complicated deployment is required (splitting the deployment).	bool	true
controller.etcConfigMap The ConfigMap(s) with keys mapping to files in `/etc/slurm` on the controller only. This ConfigMap must not contain: `slurm.conf` `plugstack.conf` `gres.conf` `cgroup.conf` `topology.conf`	string \| list	null
controller.image The image to use for the controller.	object	repository: docker.artifacts.coreweave.com/slurm-containers-public/controller tag:
controller.jobSkipIds The controller skips processing this list of Slurm JobIds.	list	[]
controller.labels Additional labels for controller resources.	object	{}
controller.livenessProbe The liveness probe for the controller.	object	exec: command: - scontrol - ping failureThreshold: 6 initialDelaySeconds: 15 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 60
controller.priorityClassName The priority class name for the controller.	string	"sunk-control-plane"
controller.readinessProbe The readiness probe for the controller.	object	map[]
controller.replicas The number of replicas of the controller to run, currently should be left at `1`.	int	1
controller.resources Resources for the controller container.	object	limits: memory: 64Gi requests: cpu: 16 memory: 64Gi
controller.securityContext.runAsGroup The group to run as, must match the slurm GID from the container image.	int	401
controller.securityContext.runAsUser The user to run as, must match the slurm UID from the container image.	int	401
controller.startupProbe The startup probe for the controller.	object	map[]
controller.stateVolume.size The size of the persistent volume claim.	string	"32Gi"
controller.stateVolume.storageClassName The storage class name to use for the volume.	string	"shared-vast"
controller.terminationGracePeriodSeconds The termination grace period for the controller.	int	30
controller.volumeMounts Additional volume mounts to apply to the controller pod.	list	[]
controller.volumes Additional volumes to mount to the controller pod.	list	[]
controller.watch.enabled Enable watching the Slurm configuration and triggering a reconfigure when there are changes.	bool	true
controller.watch.interval The interval in seconds to check for changes in the Slurm configuration.	int	60
controller.watch.livenessProbe The liveness probe for the watch container.	object	null
controller.watch.readinessProbe The readiness probe for the watch container.	object	null
controller.watch.startupProbe The startup probe for the watch container.	object	null
directoryService.debugLevel A bit mask of what SSSD debug levels to enable.	int	0x01F0
directoryService.directories The directory services to configure. Click to expand examples. Google Secure LDAP Example directories: - name: google-example.com enabled: true ldapUri: ldaps://ldap.google.com:636 user: canary: [email protected] defaultShell: "/bin/bash" fallbackHomeDir: "/home/%u" overrideHomeDir: /mnt/nvme/home/%u ldapsCert: google-ldaps-cert schema: rfc2307bis CoreWeave LDAP Example directories: - name: coreweave.cloud enabled: true ldapUri: ldap://openldap user: bindDn: cn=admin,dc=coreweave,dc=cloud searchBase: dc=coreweave,dc=cloud existingSecret: bind-user-sssd-config canary: admin defaultShell: "/bin/bash" fallbackHomeDir: "/home/%u" schema: rfc2307 Authentik Example directories: - name: coreweave.cloud enabled: true ldapUri: ldap://authentik-outpost-ldap-outpost user: bindDn: cn=ldapsvc,dc=coreweave,dc=cloud searchBase: dc=coreweave,dc=cloud existingSecret: bind-user-sssd-config canary: ldapsvc startTLS: true userObjectClass: user groupObjectClass: group userNameAttr: cn groupNameAttr: cn schema: rfc2307bis Active Directory Example directories: - name: contoso.com enabled: true ldapUri: ldap://domaincontroller.tenant-my-tenant.coreweave.cloud user: bindDn: CN=binduser,CN=Users,DC=contoso,DC=com searchBase: DC=contoso,DC=com existingSecret: bind-user-sssd-config canary: binduser defaultShell: "/bin/bash" fallbackHomeDir: "/home/%u" schema: AD	list
directoryService.directories[0].additionalConfig Multi-line string of additional arbitrary config per domain for sssd. Example: click to expand Example additionalConfig: \| ldap_foo = bar	string	null
directoryService.directories[0].defaultShell The default user shell.	string	"/bin/bash"
directoryService.directories[0].enabled Enable the directory service.	bool	false
directoryService.directories[0].fallbackHomeDir The fallback user home directory.	string	"/home/%u"
directoryService.directories[0].ignoreGroupMembers This overrides SSSD configuration of the same name If set to `true`, SSSD only retrieves information about the group objects themselves and not their members, providing a significant performance boost. If omitted, defaults to `true`.	bool	null
directoryService.directories[0].ldapUri The LDAP URI to use for the directory service. Example: `ldap://YOUR_LDAP_IP` For Google Secure LDAP, use: `ldaps://ldap.google.com:636`	string	null
directoryService.directories[0].ldapsCert Name of existing TLS certificate for LDAP-S. Example: click to expand Example kubectl create secret tls google-ldaps-cert \ --cert=Google_2025_08_24_55726.crt \ --key=Google_2025_08_24_55726.key	string	null
directoryService.directories[0].name Name of the directory service. The primary domain should always be named: `default`	string	"default"
directoryService.directories[0].overrideGidAttr Override the default schema LDAP attribute that corresponds to the user's primary group id. Example: `posixGid`	string	null
directoryService.directories[0].overrideHomeDir Override the homeDirectory attribute from LDAP with a provided path. Example: `/mnt/nvme/home/%u`	string	null
directoryService.directories[0].overrideUidAttr Override the default schema LDAP attribute that corresponds to the user's id. Example: `posixUid`	string	null
directoryService.directories[0].overrideUserNameAttr Override the default schema LDAP attribute that corresponds to the user's login name. Example: `employeeNumber`	string	null
directoryService.directories[0].schema The desired LDAP schema for the directory service. Valid values: `AD` `POSIX` `rfc2307bis` Note: For Google Secure LDAP, use `rfc2307bis`.	string	"AD"
directoryService.directories[0].user.bindDn The LDAP bind DN to use for the directory service. Where bindDn is not required (e.g. Google Secure LDAP), only supply `user.canary`. Example: `cn=Admin,ou=Users,ou=CORP,dc=corp,dc=example,dc=com`	string	null
directoryService.directories[0].user.canary The username to lookup to confirm LDAP is working.	string	null
directoryService.directories[0].user.existingSecret Name of an existing secret containing an SSSD configuration snippet with the `ldap_default_authtok` set for this domain.	string	null
directoryService.directories[0].user.existingSecretFileName The name of the file in the existing secret that contains the ldap passwords.	string	"ldap-password.conf"
directoryService.directories[0].user.groupSearchBase The LDAP group search base to use for the directory service. Example: `ou=groups,dc=example,dc=com`	string	null
directoryService.directories[0].user.password The password to use for the directory service lookups.	string	null
directoryService.directories[0].user.searchBase The LDAP search base to use for the directory service. Example: `dc=corp,dc=example,dc=com`	string	null
directoryService.negativeCacheTimeout Negative caching value (in seconds). Determines how long an invalid entry will be cached before asking LDAP again. This improves the directory listing time when a primary gid cannot be found.	string	"600"
directoryService.sudoGroups List of Unix groups from all directories with sudo privileges. Group names are fully-qualified for additional directories. Group names are not fully-qualified for the default directory; (e.g. "group1" instead of "[email protected]")	list	[]
directoryService.watchInterval The interval in seconds to check for changes in sssd configuration.	int	60
global.annotations Additional annotations to apply to all resources.	object	{}
global.dnsConfig.additionalSearches A list of namespaces to add to the list of DNS searches. These additional searches extend hostname lookup in the control-plane, compute, and login pods. Default dns searches: - name-compute.namespace.svc.cluster.local - slurm_cluster_name-controller.namespace.svc.cluster.local	list	[]
global.imagePullPolicy The image pull policy for all containers.	string	"IfNotPresent"
global.labels Additional labels to apply to the all resources.	object	{}
global.nodeSelector.affinity The affinity for the Slurm control-plane components.	object	nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node.coreweave.cloud/class operator: In values: - cpu
global.volumeMounts The list of volume mounts to apply to all compute, controller, accounting, and login pods	list	[]
global.volumes The list of volumes to mount to all compute, controller, accounting, and login pods	list	[]
imagePullSecrets The list of secrets used to access images in a private registry.	list	[]
jwt.existingSecret The name of an existing secret containing the JWT private key, otherwise the chart will generate one.	string	null
login.annotations Additional annotations.	object	{}
login.automountServiceAccountToken Automatically mount the service account token into the login pod.	bool	false
login.containers Additional sidecar containers to add to the login pod.	list	[]
login.enabled Enable the login nodes	bool	true
login.env Additional environment variables to pass to the sshd container.	list	[]
login.hostAliases Provides Pod-level override of hostname resolution when DNS and other options are not applicable in login pods. See Adding entries to Pod /etc/hosts with HostAliases for more information.	list	[]
login.image The image to use for the login node.	object	repository: docker.artifacts.coreweave.com/slurm-containers-public/controller-extras tag:
login.individualResources Resources for the slurm-login pod sshd container.	object	limits: memory: 2Gi requests: cpu: 500m memory: 300Mi
login.labels Additional labels.	object	null
login.nodeSelector.affinity The affinity for the login nodes. This overrides the value of `global.nodeSelector.affinity`.	object	null
login.priorityClassName The priority class name for the login pod.	string	null
login.replicas The number of replicas of the login node. When running more than one, a pod specific-service is created for each one in addition to the main service.	int	1
login.resources.limits.memory	string	"8Gi"
login.resources.requests.cpu	int	4
login.resources.requests.memory	string	"8Gi"
login.s6 oneshot and longrun jobs are supported. See Running Scripts with S6 for more information. Example: click to expand Example s6: packages: type: oneshot timeoutUp: 0 timeoutDown: 0 script: \| #!/usr/bin/env bash apt -y update apt -y install nginx nginx: type: longrun timeoutUp: 0 timeoutDown: 0 script: \| #!/usr/bin/env bash nginx -g "daemon off;"	object	{}
login.service.additionalPorts Additional port definitions to expose. Example: click to expand Example additionalPorts: - name: eternal-shell port: 2022 targetPort: 20222 # optional protocol: TCP # optional	list	[]
login.service.enabled Enable the creation of service(s) for login pods.	bool	true
login.service.externalTrafficPolicy The external traffic policy.	string	"Local"
login.service.loadBalancerClass The load balancer class to use for the login services.	string	null
login.service.metadata.0.annotations Additional annotations to apply to the first login service (0).	object	{}
login.service.metadata.0.labels Additional labels to apply to the common first login service (0).	object	{}
login.service.metadata.common.annotations Additional annotations to apply to the common login service.	object	{}
login.service.metadata.common.labels Additional labels to apply to the common login service.	object	{}
login.service.metadata.global.annotations Additional annotations to apply to all login services.	object	null
login.service.metadata.global.labels Additional labels to apply to all login services.	object	{}
login.service.type The type of service to create. This defaults to `LoadBalancer` for cloud deployments. For development and test systems without an external load balancer to handle the service routing, such as when deploying on kind (Kubernetes IN Docker), this may be set to `ClusterIP`.	string	"LoadBalancer"
login.serviceAccountName The service account name to use for the login pod.	string	"default"
login.sshKeyVolume.accessModes The access mode for the storage. If scaling login beyond 1 replica, this must be `ReadWriteMany`. In a development setting with a volume provider that doesn't support `ReadWriteMany`, such as kind (Kubernetes IN Docker), this may be set to `ReadWriteOnce`.	string	[ "ReadWriteMany" ]
login.sshKeyVolume.enabled Enable the ssh key volume, to allow keys to be mounted and persisted in the login pod. If this is disabled the host keys for the login pod will be regenerated on each container restart.	bool	true
login.sshKeyVolume.size The size of the persistent volume claim.	string	"1Gi"
login.sshKeyVolume.storageClassName The storage class name to use for the volume.	string	"shared-vast"
login.sshdConfig Additional sshd configuration to add to the login pod. Example: click to expand Example sshdConfig: \| PasswordAuthentication no	string	null
login.sshdLivenessProbe.config The liveness probe for the login sshd container.	object	failureThreshold: 10 initialDelaySeconds: 10 periodSeconds: 5 tcpSocket: port: 22
login.sshdLivenessProbe.enabled If the liveness probe for the login sshd is enabled	bool	%!s(bool=false)
login.sshdReadinessProbe.config The readiness probe for the login sshd container.	object	map[]
login.sshdReadinessProbe.enabled If the readiness probe for the login sshd container is enabled	bool	%!s(bool=false)
login.sshdStartupProbe.config The startup probe for the login sshd container.	object	map[]
login.sshdStartupProbe.enabled If the startup probe for the login sshd container is enabled	bool	%!s(bool=false)
login.terminationGracePeriodSeconds The termination grace period for the login pod.	int	30
login.updateStrategy The update strategy for the login node- Default is type is RollingUpdate	object	{}
login.volumeMounts Additional volume mounts to apply to the login pod.	list	[]
login.volumes Additional volumes to add to the login pod. Example: click to expand Example volumes: - name: cache-vol emptyDir: medium: Memory	list	[]
munge.args The additional arguments to pass to the munge container. The defaults run Munge with 10 threads instead of 2.	list	[ "--num-threads", "10" ]
munge.livenessProbe The liveness probe for the munge container.	object	map[]
munge.readinessProbe The readiness probe for the munge container.	object	map[]
munge.resources Resources for the munge container.	object	limits: memory: 2Gi requests: cpu: 1 memory: 2Gi
munge.securityContext.runAsGroup The group to run as, must match the munge GID from the container image.	int	400
munge.securityContext.runAsUser The user to run as, must match the munge UID from the container image.	int	400
munge.startupProbe The startup probe for the munge container.	object	map[]
mysql Options for Bitnami MySQL chart, uses Bitnami default values. There is an added option here: `vmPodScrape.enabled` which can be used as an alternative to `serviceMonitor.enabled`.	object	See Bitnami default values.
nsscache.annotations Additional annotations for nsscache Job resources.	object	{}
nsscache.cronJobSchedule The schedule for the nsscache update CronJob. It should be formatted according to the cron convention.	string	"* * * * *"
nsscache.enabled Enable nsscache.	bool	false
nsscache.existingSecret Name of an existing secret containing the LDAP password for this domain. This secret should contain a key named `nsscache-ldap-password` which contains the password to use for the LDAP bind DN. For SCIM, this secret should contain a key named `nsscache-scim-auth-token` which contains the token to use for the SCIM server.	string	null
nsscache.labels Additional labels for nsscache Job resources.	object	{}
nsscache.nodeSelector.affinity The affinity for the nsscache Job. This overrides the value of `global.nodeSelector.affinity`.	object	null
nsscache.nsscacheConfig Options for defining nsscache.conf. Click to exapand examples. LDAP Example nsscacheConfig: default: source: ldap ldap_uri: ldap://authentik-outpost-ldap-outpost ldap_base: dc=coreweave,dc=cloud ldap_bind_dn: cn=ldapsvc,dc=coreweave,dc=cloud ldap_bind_password: ldap_rfc2307bis: 1 ldap_default_shell: /bin/bash passwd: ldap_filter: (objectClass=user) ldap_override_home_dir: /mnt/home/%%u group: ldap_filter: (objectClass=group) shadow: ldap_filter: (objectClass=user) sshkey: ldap_filter: (objectClass=user) SCIM Example nsscacheConfig: default: source: scim scim_base_url: https://api.coreweave.com/scim/abc123 scim_users_parameters: filter=active eq "true"&groups=slurm-users,slurm-admins scim_groups_parameters: excludeInactiveUsers=true&includeVirtualUserGroups=slurm-users,slurm-admins	object	See the nsscache.conf documentation.
nsscache.nsscacheConfig.default.cache Specifying the means in which the cache data will be stored.	string	"files"
nsscache.nsscacheConfig.default.files_cache_filename_suffix A suffix appended to the cache filename to differentiate it from, say, system NSS databases.	string	"cache"
nsscache.nsscacheConfig.default.files_dir Directory location to store the plain text files in.	string	"/etc/nsscache"
nsscache.nsscacheConfig.default.ldap_base The base to perform LDAP searches under. Example: `dc=coreweave,dc=cloud`	string	null
nsscache.nsscacheConfig.default.ldap_bind_dn The bind DN to use when connecting to LDAP. Empty string is an anonymous bind. Example: `cn=ldapsvc,dc=coreweave,dc=cloud`	string	null
nsscache.nsscacheConfig.default.ldap_bind_password The password to use for the LDAP bind DN. We strongly recommend using a Kubernetes secret to store this password and reference it using the `nsscache.existingSecret` value.	string	null
nsscache.nsscacheConfig.default.ldap_default_shell This will be the default shell for all users. You can specify a different shell by setting the `loginShell` value in the user attributes in the source directory configuration. Example: `/bin/bash`	string	null
nsscache.nsscacheConfig.default.ldap_rfc2307bis Example: `1`	int	null
nsscache.nsscacheConfig.default.ldap_uri The LDAP URI to connect to. Example: `ldap://authentik-outpost-ldap-outpost`	string	null
nsscache.nsscacheConfig.default.maps The recommended defaults below are useful for standard nsscache operation in many environments.	list	[ "passwd", "shadow", "group", "sshkey" ]
nsscache.nsscacheConfig.default.scim_base_url The base URL for the SCIM server. Example: `https://api.coreweave.com/scim/<org>`	string	null
nsscache.nsscacheConfig.default.scim_groups_endpoint The endpoint for the SCIM groups API.	string	"CoreWeaveGroups"
nsscache.nsscacheConfig.default.scim_groups_parameters Option to use url parameters for groups endpoint. Special characters (spaces, quotes, etc.) will be automatically URL encoded. There is a custom parameter for creating virtual user groups that is a comma separated list. It will create an entry in the groups map for the user's gid for the members of the selected group(s). This parameter typically should match any group filtering in scim_users_parameters. Including a filter for inactive users by default. Example: `excludeInactiveUsers=true&includeVirtualUserGroups=slurm-users,slurm-admins`	string	"excludeInactiveUsers=true"
nsscache.nsscacheConfig.default.scim_users_endpoint The endpoint for the SCIM users API.	string	"Users"
nsscache.nsscacheConfig.default.scim_users_parameters Option to use url parameters for users endpoint. Special characters (spaces, quotes, etc.) will be automatically URL encoded. There is a custom parameter for filtering by groups that is a comma separated list. Including a filter for inactive users by default. Example: `filter=active eq "true"&groups=slurm-users,slurm-admins`	string	`filter=active eq "true"`
nsscache.nsscacheConfig.default.source Specify the data source to use. Supported options are `scim` and `ldap`.	string	"scim"
nsscache.nsscacheConfig.default.timestamp_dir Specifying the location of the timestamps used for incremental updates.	string	"/var/lib/nsscache"
nsscache.nsscacheConfig.group.scim_path_gid The SCIM path for the GID attribute.	string	"sunkPosixGroupId"
nsscache.nsscacheConfig.group.scim_path_username The SCIM path for the GID attribute.	string	"members/sunkPosixUsername"
nsscache.nsscacheConfig.passwd.ldap_filter The search filter to use when querying. Example: `(objectClass=user)`	string	null
nsscache.nsscacheConfig.passwd.ldap_override_home_dir This will override the home directory all users. %%u will be replaced with the username. this should match the mount found in compute.VolumeMounts Example: `/mtn/home/%%u`	string	null
nsscache.nsscacheConfig.passwd.scim_default_shell This will be the default shell for all users.	string	"/bin/bash"
nsscache.nsscacheConfig.passwd.scim_override_home_directory This will override the home directory all users. %%u will be replaced with the username. this should match the mount found in compute.VolumeMounts Example: `/mnt/home/%%u`	string	"/mnt/home/%%u"
nsscache.nsscacheConfig.passwd.scim_path_gid The SCIM path for the GID attribute.	string	"urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixGroupId"
nsscache.nsscacheConfig.passwd.scim_path_home_directory The SCIM path for the home directory attribute.	string	"urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPreferredHomeDirectory"
nsscache.nsscacheConfig.passwd.scim_path_login_shell The SCIM path for the login shell attribute.	string	"urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkLoginShell"
nsscache.nsscacheConfig.passwd.scim_path_uid The SCIM path for the UID attribute.	string	"urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUserId"
nsscache.nsscacheConfig.passwd.scim_path_username The SCIM path for the username attribute.	string	"urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUsername"
nsscache.nsscacheConfig.shadow.ldap_filter The search filter to use when querying. Example: `(objectClass=user)`	string	null
nsscache.nsscacheConfig.shadow.scim_path_username The SCIM path for the username attribute.	string	"urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUsername"
nsscache.nsscacheConfig.sshkey.ldap_filter The search filter to use when querying. Example: `(objectClass=user)`	string	null
nsscache.nsscacheConfig.sshkey.scim_path_ssh_keys The SCIM path for the SSH keys attribute.	string	"urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkSshKeys"
nsscache.nsscacheConfig.sshkey.scim_path_username The SCIM path for the username attribute.	string	"urn:coreweave:params:scim:schemas:extension:coreweave:2.0:CoreWeaveUser/sunkPosixUsername"
nsscache.nsswitchConfig Options for defining nsswitch.conf.	object	See the nsswitch.conf documentation.
nsscache.nsswitchConfig.aliases Mail aliases, used by getaliasent(3) and related functions.	list	[]
nsscache.nsswitchConfig.ethers Ethernet numbers.	list	[]
nsscache.nsswitchConfig.group Groups of users, used by getgrent(3) and related functions.	list	[ "files", "cache" ]
nsscache.nsswitchConfig.hosts Host names and numbers, used by gethostbyname(3) and related functions.	list	[]
nsscache.nsswitchConfig.initgroups Supplementary group access list, used by getgrouplist(3) function.	list	[]
nsscache.nsswitchConfig.netgroup Network-wide list of hosts and users, used for access rules. C libraries before glibc 2.1 supported netgroups only over NIS.	list	[]
nsscache.nsswitchConfig.networks Network names and numbers, used by getnetent(3) and related functions.	list	[]
nsscache.nsswitchConfig.passwd User passwords, used by getpwent(3) and related functions.	list	[ "files", "cache" ]
nsscache.nsswitchConfig.protocols Network protocols, used by getprotoent(3) and related functions.	list	[]
nsscache.nsswitchConfig.publickey Public and secret keys for Secure_RPC used by NFS and NIS+.	list	[]
nsscache.nsswitchConfig.rpc Remote procedure call names and numbers, used by getrpcbyname(3) and related functions.	list	[]
nsscache.nsswitchConfig.services Network services, used by getservent(3) and related functions.	list	[]
nsscache.nsswitchConfig.shadow Shadow user passwords, used by getspnam(3) and related functions.	list	[]
nsscache.priorityClassName The priority class name for the nsscache Job pod.	string	"sunk-control-plane"
nsscache.resources Resources for the nsscache Job container.	object	limits: memory: 500Mi requests: cpu: 200m memory: 100Mi
nsscache.sudoGroups List of Unix groups with sudo privileges.	list	[]
nsscache.tolerations The tolerations for the nsscache Job	list	null
rest.annotations Additional annotations for REST API resources.	object	{}
rest.args The additional arguments to pass to the rest container. Defaults enable debug logging and only load most recent openAPI plugins.	list	[ "-vv", "-sslurmdbd,slurmctld", "-dv0.0.40" ]
rest.containers Additional sidecar containers to add to the restd pod.	list	[]
rest.enabled Enable the REST API deployment This is optional and should be disabled for most use cases.	bool	false
rest.env The additional environment variables to pass to the rest container.	list	[ { "name": "SLURMRESTD_JSON", "value": "compact" } ]
rest.image The image to use for the REST API deployment.	object	repository: docker.artifacts.coreweave.com/slurm-containers-public/controller tag:
rest.labels Additional labels for REST API resources.	object	{}
rest.livenessProbe The liveness probe for the rest container.	object	tcpSocket: port: slurmrestd failureThreshold: 2 periodSeconds: 10
rest.priorityClassName The priority class name for the rest pod.	string	null
rest.readinessProbe The readiness probe for the rest container.	object	tcpSocket: port: slurmrestd periodSeconds: 5 failureThreshold: 1
rest.replicas The number of replicas of the rest pod to run. In most production environments this should be set to a minimum of 2 to provide HA.	int	1
rest.resources Resources for the slurmrestd container. These defaults are appropriate for small and medium-sized clusters.	object	limits: memory: 64Gi requests: cpu: 2 memory: 8Gi
rest.securityContext.runAsGroup The group to run as, must match the slurm GID from the container image.	int	401
rest.securityContext.runAsUser The user to run as, must match the slurm UID from the container image.	int	401
rest.service.additionalPorts Additional port definitions to expose. Example: click to expand Example additionalPorts: - name: proxy port: 8080 targetPort: 8080 # optional protocol: TCP # optional	list	[]
rest.service.annotations Additional annotations to apply to rest service.	object	{}
rest.service.clusterIP	string	"None"
rest.service.enabled Enable the creation of service for rest pods.	bool	true
rest.service.externalTrafficPolicy The external traffic policy.	string	null
rest.service.labels Additional labels to apply rest service.	object	{}
rest.service.loadBalancerClass The load balancer class to use for the rest services.	string	null
rest.service.type The type of service to create. This defaults to `ClusterIP`.	string	"ClusterIP"
rest.startupProbe The startup probe for the rest container.	object	tcpSocket: port: slurmrestd failureThreshold: 20 periodSeconds: 2
rest.terminationGracePeriodSeconds The termination grace period for the rest pod.	int	5
rest.volumeMounts Additional volume mounts to apply to the rest pod.	list	[]
rest.volumes Additional volumes to add to the restd pod. Example: click to expand Example volumes: - name: cache-vol emptyDir: medium: Memory	list	[]
scheduler.annotations Additional annotations for scheduler resources.	object	{}
scheduler.config.scheduler.gpuTypes Mapping of k8s gpu types to Slurm gpu types. The keys represent GPU types required during scheduling from the node affinity using the key "gpu.nvidia.com/class" and the values represent the gres gpu type in Slurm. This gets added to a job's description.	map	{ "A100_NVLINK_80GB": "a100", "H100_NVLINK_80GB": "h100" }
scheduler.config.scheduler.pollInterval The polling interval for the Slurm API.	string	"10s"
scheduler.config.scheduler.terminationOffset offset termination grace period to account for communication delays etc.	string	"5s"
scheduler.config.slurm.poolSize The number of connections to be maintained in the connection pool.	int	10
scheduler.config.slurm.protocolVersion The protocol version to use for communication with the Slurm controller.	string	"24_11"
scheduler.config.slurm.usePersistentConnection Use Slurm's persistent connections for connection reuse.	bool	true
scheduler.controllerAddress The address of the Slurm controller to connect to. This should be the service address of the controller in host:port format.	string	""
scheduler.enabled Enable the scheduler. To schedule k8s pods on the Slurm cluster nodes, this must be enabled.	bool	false
scheduler.hooksAPI config for the webhooks.	object	{ "waitForPodDeletionInterval": "1s" }
scheduler.hooksAPI.waitForPodDeletionInterval The polling interval when checking for pod deletion.	string	"1s"
scheduler.image The image to use for the scheduler.	object	repository: registry.gitlab.com/coreweave/sunk/operator tag:
scheduler.labels Additional labels for scheduler resources.	object	{}
scheduler.livenessProbe The liveness probe for the scheduler container.	object	httpGet: path: /healthz port: 8081 initialDelaySeconds: 15 periodSeconds: 20
scheduler.logLevel The log level. Uses integers or zap log level strings: `debug` `info` `warn` `error` `dpanic` `panic` `fatal`	string	"info"
scheduler.maxConcurrentReconciles The maximum concurrent reconciles. This should be adjusted based on the volume of pods using the scheduler to handle bursts operations quickly. The size of both the Slurm and Kubernetes clusters will impact this but less than the syncer. The driving factor here tends to be the pod volume and associated Slurm jobs more than anything else. Using the same value as the syncer should be a rather conservative starting point in many use cases.	int	50
scheduler.name The name of the scheduler used to select the scheduler during pod creation. By default the name is based on the namespace and release name `<namespace>-<release>-scheduler` when not set.	string	null
scheduler.priorityClassName The priority class name for the scheduler pod.	string	"sunk-control-plane"
scheduler.readinessProbe The readiness probe for the scheduler container.	object	httpGet: path: /readyz port: 8081 initialDelaySeconds: 5 periodSeconds: 10
scheduler.resources Resources for the scheduler container.	object	limits: memory: 24Gi cpu: 16 requests: cpu: 4 memory: 24Gi
scheduler.scope.namespaces The list of the namespaces to scope the scheduler to. Only used when `scope.type` is set to `namespace`. Namespaces other than the release namespace will need role bindings created.	list	[.Release.Namespace]
scheduler.scope.type The type can be `cluster` or `namespace`.	string	"namespace"
scheduler.startupProbe The startup probe for the scheduler container.	object	map[]
secretJob.annotations Additional annotations for secret Job resources.	object	{}
secretJob.labels Additional labels for secret Job resources.	object	{}
secretJob.nodeSelector.affinity The affinity for the secret job. This overrides the value of `global.nodeSelector.affinity`.	object	null
secretJob.priorityClassName The priority class name for the secret job pod.	string	"sunk-control-plane"
secretJob.resources Resources for the secret job container.	object	limits: memory: 500Mi requests: cpu: 200m memory: 100Mi
secretJob.tolerations The tolerations for the secret job	list	null
slurm-login Configure individual login nodes via `slurm-login` subchart. Below is an example showing some of the key parameters of the subchart, see subchart docs for all parameters. Example: click to expand Example slurm-login: enable: true directoryCache: # select users from two groups selectGroups: ["slum-researches", "slurm-ops"] # poll every minutes (default 90s) interval: 1m # Google Secure LDAP directoryService: directories: - name: google-example.com enabled: true ldapUri: ldaps://ldap.google.com:636 user: canary: [email protected] defaultShell: "/bin/bash" fallbackHomeDir: "/home/%u" overrideHomeDir: /mnt/nvme/home/%u ldapsCert: google-ldaps-cert schema: rfc2307bis	object	See default values in `slurm-login` subchart.
slurmConfig.AccountingStorageEnforce Controls what level of association-based enforcement to impose on job submissions. Multiple values allowed. Valid options are any combination of: `associations` `limits` `nojobs` `nosteps` `qos` `safe` `wckeys` Use `all` to impose everything except `nojobs` and `nosteps`, which must be requested separately. See the Slurm documentation for more details.	list	[ "qos", "limits" ]
slurmConfig.AccountingStorageTRES	string	"gres/gpu"
slurmConfig.AccountingStorageType	string	"accounting_storage/slurmdbd"
slurmConfig.AuthAltParameters[0]	string	"jwt_key=/etc/jwt/jwt.key"
slurmConfig.AuthAltTypes[0]	string	"auth/jwt"
slurmConfig.BatchStartTimeout	int	120
slurmConfig.CommunicationParameters The list of communication parameters to pass to slurmCtld. See the Slurm documentation for possible values.	list	- KeepAliveTime=60 - keepaliveinterval=10 - keepaliveprobes=3
slurmConfig.DefMemPerCPU The default memory per CPU in megabytes. Sets the slurm.conf parameter of the default real memory size available per usable allocated CPU in megabytes. This value is used when the `--mem-per-cpu` option is not specified on the `srun` command line.	int	4096
slurmConfig.Epilog	string	"/usr/share/sunk/bin/epilog.sh"
slurmConfig.GresTypes[0]	string	"gpu"
slurmConfig.InactiveLimit Terminate job allocation commands, such as `srun` or `salloc`, that are unresponsive longer than this interval in seconds. See the slurm.conf reference for more details.	int	0
slurmConfig.JobAcctGatherFrequency	int	30
slurmConfig.JobAcctGatherType	string	"jobacct_gather/linux"
slurmConfig.JobCompType	string	"jobcomp/none"
slurmConfig.JobSubmitPlugins The job submit plugins to use.	list	[]
slurmConfig.KillWait The interval in seconds between the SIGTERM and SIGKILL signals given to a job's processes upon reaching its time limit. See the slurm.conf reference for more details.	int	30
slurmConfig.MaxNodeCount	int	3072
slurmConfig.MessageTimeout	int	100
slurmConfig.MinJobAge	int	300
slurmConfig.MpiDefault	string	"pmix"
slurmConfig.ProctrackType The plugin to be used for process tracking on a job step basis. See the Slurm documentation for more details. Valid values: `proctrack/linuxproc` `proctrack/cgroup`	string	"proctrack/cgroup"
slurmConfig.Prolog	string	"/usr/share/sunk/bin/prolog.sh"
slurmConfig.PrologFlags[0]	string	"Alloc"
slurmConfig.PrologFlags[1]	string	"Serial"
slurmConfig.RebootProgram	string	"/usr/share/sunk/bin/reboot.sh"
slurmConfig.ReturnToService	int	2
slurmConfig.SUNKJobDashboardURL	string	null
slurmConfig.SUNKNodeDashboardURL	string	null
slurmConfig.SchedulerParameters[0]	string	"nohold_on_prolog_fail"
slurmConfig.SchedulerParameters[1]	string	"max_rpc_cnt=256"
slurmConfig.SchedulerType	string	"sched/backfill"
slurmConfig.SelectType	string	"select/cons_tres"
slurmConfig.SelectTypeParameters The values to use for the parameters of the select/cons_tres plugin. Allowed values depend on the configured value of `SelectType`. See the slurm.conf reference for more details.	string	"CR_Core"
slurmConfig.SlurmSchedLogFile	string	"/dev/null"
slurmConfig.SlurmSchedLogLevel	int	1
slurmConfig.SlurmUser	string	"slurm"
slurmConfig.SlurmctldDebug	string	"verbose"
slurmConfig.SlurmctldLogFile	string	"/dev/null"
slurmConfig.SlurmctldParameters The list of additional parameters to pass to slurmCtld. See the Slurm documentation for possible values.	list	- idle_on_node_suspend - node_reg_mem_percent=95
slurmConfig.SlurmctldPidFile	string	"/var/run/slurmctld.pid"
slurmConfig.SlurmctldPort	int	6817
slurmConfig.SlurmctldTimeout The interval, in seconds, that the backup controller waits for the primary controller to respond before assuming control. The default value is 120 seconds. May not exceed 65533.	int	60
slurmConfig.SlurmdDebug	string	"verbose"
slurmConfig.SlurmdLogFile	string	"/dev/null"
slurmConfig.SlurmdPidFile	string	"/var/run/slurmd.pid"
slurmConfig.SlurmdPort	int	6818
slurmConfig.SlurmdSpoolDir	string	"/var/spool/slurmd"
slurmConfig.SlurmdTimeout The interval, in seconds, that the Slurm controller waits for slurmd to respond before configuring that node's state to DOWN.	int	60
slurmConfig.StateSaveLocation	string	"/var/spool/slurmctld/save"
slurmConfig.SuspendTime Nodes which remain idle or down for this number of seconds will be placed into power save mode by SuspendProgram.	string	"INFINITE"
slurmConfig.SwitchType	string	"switch/none"
slurmConfig.TCPTimeout	int	15
slurmConfig.TaskPlugin The task plugin to use. See the Slurm documentation for more details. Multiple comma-separated values allowed. Valid values: `task/affinity` `task/cgroup` `task/none`	string	"task/none"
slurmConfig.TopologyParam[0]	string	"TopoOptional"
slurmConfig.TopologyPlugin	string	"topology/tree"
slurmConfig.TreeWidth	int	65533
slurmConfig.UnkillableStepProgram	string	"/usr/share/sunk/bin/unkillable-step.sh"
slurmConfig.UnkillableStepTimeout	int	900
slurmConfig.WaitTime Specifies how many seconds the `srun` command should wait after the first task terminates before terminating all remaining tasks. Using the `--wait` option on the `srun` command line overrides this value. The default value is `0`, which disables this feature. See the slurm.conf reference for more details.	int	0
slurmConfig.cgroupConfig The `cgroup.conf` value. This is only used when `ProctrackType` is set to `proctrack/cgroup`. Note: `cgroup/v2` should be used over `autodetect` on systems using cgroup v2.	object	CgroupPlugin: autodetect IgnoreSystemd: yes ConstrainCores: yes ConstrainDevices: yes ConstrainRAMSpace: yes
sssdContainer.enabled Enable the sssd sidecar container.	bool	true
sssdContainer.livenessProbe The liveness probe for the sssd container.	object	map[]
sssdContainer.readinessProbe The readiness probe for the sssd container.	object	map[]
sssdContainer.startupProbe The startup probe for the sssd container.	object	map[]
syncer.annotations Additional annotations for syncer resources.	object	{}
syncer.config.slurm.poolSize The number of connections to be maintained in the connection pool.	int	10
syncer.config.slurm.protocolVersion The protocol version to use for communication with the Slurm controller.	string	"24_11"
syncer.config.slurm.usePersistentConnection Use Slurm's persistent connections for connection reuse.	bool	true
syncer.config.syncer.nodesetUpdateJobPreemption Configuration for job preemption support.	object	{ "enabled": false, "method": null }
syncer.config.syncer.nodesetUpdateJobPreemption.enabled Enable job preemption support.	bool	false
syncer.config.syncer.nodesetUpdateJobPreemption.method The method to use for job preemption. Valid values: `partition` `qos` `time`	string	null
syncer.config.syncer.orphanedPodDelay The delay to wait before deleting a pod that is no longer associated with a Slurm node.	string	"120s"
syncer.config.syncer.pollInterval The polling interval for the Slurm API.	string	"10s"
syncer.config.syncer.qosInterruptable The externally defined label to indicate if pod is interruptable.	string	"qos.coreweave.cloud/interruptable"
syncer.config.syncer.reconfigureRateLimit The rate limit, in seconds, for Slurm reconfigure requests based on additions to NodeSlices. The value must be above 0 seconds to enable this feature. Warning: if this value is too low, `scontrol reconfigure` may be executed too often, especially during periods when several nodes are newly added.	string	"3600s"
syncer.config.syncer.slurmNodeCleanUp Removes lingering Slurm nodes from the cluster after they have been removed from their associated SUNK NodeSets.	bool	true
syncer.controllerAddress The address of the Slurm controller to connect to. This should be the service address of the controller in host:port format.	string	""
syncer.enabled Enable the syncer. This is required for most functionality and should only be disabled for troubleshooting.	bool	true
syncer.hooksAPI config for the webhooks.	object	{ "nodeRebootCondition": "PhaseState", "nodeRebootReason": "production-powerreset", "safeNodeRebootCondition": "PendingPhaseState", "safeNodeRebootReason": "production-powerreset", "waitForNodeLockedInterval": "1s", "waitForNodeLockedTimeout": "120s" }
syncer.hooksAPI.nodeRebootCondition Condition to indicate node should be rebooted.	string	"PhaseState"
syncer.hooksAPI.nodeRebootReason The target NLCC lifecycle state associated with the `nodeRebootCondition`.	string	"production-powerreset"
syncer.hooksAPI.safeNodeRebootCondition Condition to indicate node should be rebooted safely.	string	"PendingPhaseState"
syncer.hooksAPI.safeNodeRebootReason The target NLCC lifecycle state associated with the `safeNodeRebootCondition`.	string	"production-powerreset"
syncer.hooksAPI.waitForNodeLockedInterval The polling interval when checking for node locked state.	string	"1s"
syncer.hooksAPI.waitForNodeLockedTimeout The timeout for checking node locked state.	string	"120s"
syncer.image The image to use for the syncer.	object	repository: registry.gitlab.com/coreweave/sunk/operator tag:
syncer.labels Additional labels for syncer resources.	object	{}
syncer.livenessProbe The liveness probe for the syncer container.	object	httpGet: path: /healthz port: 8081 initialDelaySeconds: 15 periodSeconds: 20
syncer.logLevel The log level. Uses integers or zap log level strings: `debug` `info` `warn` `error` `dpanic` `panic` `fatal`	string	"info"
syncer.maxConcurrentReconciles The maximum concurrent reconciles. This should be adjusted based on the number of nodes and size of jobs launched in the Slurm cluster, to handle bursts operations quickly. A value 1/10th the number of nodes in the cluster is a good starting point for small clusters. As cluster size increases, this value can be a smaller fraction of the total number of nodes in most cases. For instance a value of 50 seems to handle a 2000 node cluster well. Being too aggressive here will bottleneck on other components such as the Kubernetes API server and the Slurm controller, which in some cases may cause errors.	int	50
syncer.nodePermissions.enabled Enable node operations on the syncer, currently this allows restart of nodes when enabled.	bool	true
syncer.priorityClassName The priority class name for the syncer pod.	string	"sunk-control-plane"
syncer.readinessProbe The readiness probe for the syncer container.	object	httpGet: path: /readyz port: 8081 initialDelaySeconds: 5 periodSeconds: 10
syncer.resources Resources for the syncer container.	object	limits: memory: 24Gi cpu: 16 requests: cpu: 4 memory: 24Gi
syncer.startupProbe The startup probe for the syncer container.	object	map[]
syncer.watchAllNodeSets Watch all NodeSets in the namespace. This overrides default behavior of only watching the NodeSets deployed with this chart release.	bool	false
syncer.watchNodeSets The list of NodeSets to watch. This overrides the default behavior of watching the NodeSets deployed with this chart release to instead watch this specific list. This is not used if watchAllNodeSets is set to true.	list	[]
userLookupContainer.livenessProbe The liveness probe for the user-lookup container.	object	map[]
userLookupContainer.readinessProbe The readiness probe for the user-lookup container.	object	map[]
userLookupContainer.startupProbe The startup probe for the user-lookup container.	object	map[]

Requirements​

Parameters​

Requirements

Parameters