Configure Compute Nodes

Define the resources used to run Slurm jobs

While Slurm login nodes allow you to access your Slurm cluster, Slurm compute nodes are the nodes within the cluster where jobs are actually run, and handle the resources used to run jobs submitted to Slurm.

With SUNK, you can create flexible Compute node definitions to meet the specific resource requirements of the workloads. This guide will walk you through the various methods of defining compute nodes that are optimized for performance and efficiency of the desired jobs.

Note

In SUNK, Slurm nodes run in Kubernetes Pods. These are not the same as Kubernetes Nodes, which are the worker machines that run the Pods. To distinguish between the two, Kubernetes Nodes are capitalized in this documentation, while Slurm nodes are not.

Access Slurm compute nodes

After accessing your Slurm cluster through the Slurm login node, you can interact with the Slurm compute nodes using standard Slurm commands. It is not necessary to directly access a Slurm Compute node.

Use Slurm commands, such as srun, sbatch, or salloc, to run and manage jobs on Slurm compute nodes.

Warning

We do not recommend directly accessing Slurm compute nodes through SSH to run tasks. Bypassing Slurm can interfere with currently running jobs and may cause nodes to drain unintentionally, leading to temporary loss of resources. SSH to Slurm compute nodes should only be used for debugging existing jobs on the nodes.

The manifest

The foundation for defining compute nodes is a YAML manifest, which outlines the resources and configurations for each node type. An example of the compute: section might look like this:

Example

compute:
  # See "Global Compute node options" below to learn more.
  volumeMounts: []
  volumes: []
  s6: {}
  pyxis:
  partitions:

  # Node definitions. Multiple node definitions are allowed, but
  # only those `enabled: true` will be deployed.
  nodes:
    # Another node definition.
    my-node-def:
      enabled: true
      replicas: 1
      staticFeatures:
        - foo
        - bar
      dynamicFeatures:
        node.coreweave.cloud/class: {}
        gpu.nvidia.com/class: {}
      image:
        repository: registry.gitlab.com/example

      env:
        - name: example
          value: "1"

      gresGpu: h100:8
      config:
        weight: 1
      resources:
        limits:
          memory: 960Gi
          sunk.coreweave.com/accelerator: "8"
          rdma/ib: "1"
        requests:
          cpu: "110"
          memory: 960Gi
          sunk.coreweave.com/accelerator: "8"

Global options

These are the global options shown in the YAML example above.

`compute.volumeMounts`

Declares a list of additional volumes to mount within the primary container of the node in addition to the chart global.volumeMounts.

For example:

Example

compute:
  volumeMounts:
    - name: my-pvc
      path: /mnt/my-pvc

Note

Entries which share the same path as a globally defined mount will override the mount.
These volumeMounts are also added to the login node primary container.

`compute.volumes`

Declares a list of additional volumes to attach to the Pod for the compute node. If using persistent volumes claims, ReadWriteMany access mode should be used in most cases.

For example:

Example

compute:
  volumes:
    - name: my-pvc
      persistentVolumeClaim:
        claimName: my-pvc

Note

Entries which share the same name as a globally defined mount will override the volume.
These volumes are also added to the login node Pods.

`compute.s6`

SUNK can run custom s6 scripts on compute nodes, either as oneshot or longrun jobs.

For example:

Example

compute:
  s6:
    packages:
      type: oneshot
      timeoutUp: 0
      timeoutDown: 0
      script: |
        #!/usr/bin/env bash
        apt -y install nginx
    nginx:
      type: longrun
      timeoutUp: 0
      timeoutDown: 0
      script: |
        #!/usr/bin/env bash
        nginx -g "daemon off;"

`compute.pyxis`

This section has multiple options:

Parameter	Purpose
`compute.pyxis.enabled`	Enables the pyxis container.
`compute.pyxis.mountHome`	Enables `ENROOT-MOUNT_HOME` for the pyxis container to mount the home directory.
`compute.pyxis.remapRoot`	Enables `ENROOT_REMAP_ROOT` for the pyxis container to remap the root user.
`compute.pyxis.securityContext.capabilities.add`	Adds capabilities to the pyxis container. `"SYS_ADMIN"` is required if using Pyxis.

For example:

Example

compute:
  pyxis:
    enabled: true
    mountHome: true
    remapRoot: true
    securityContext:
      capabilities:
        add: ["SYS_ADMIN"]

`compute.partitions`

A Slurm partition is a logical grouping of compute nodes (servers) within the Slurm cluster. It's a way to organize nodes based on their characteristics, such as memory size, CPU type, or GPU availability.

When a user submits a job to a Slurm-managed HPC cluster, they specify the partition where the job should run. The Slurm scheduler then assigns the job to an available node within that partition. Partitions can have different configurations and policies, such as time limits for jobs, user access restrictions, or priority levels.

A related option is compute.autoPartition.enabled, which, if true (the default), creates a partition within Slurm for each NodeSet defined in compute.nodes. The partition name will match the name of the nodes section.

Other global options

In addition to the options shown in the compute example above, several others apply globally to all compute nodes.

Parameter	Purpose
`compute.generateTopology`	If `true`, generate the network topology.
`compute.initialState` and `compute.initialStateReason`	The initial State for the nodes when they join the Slurm cluster, generally `drain` or `idle`, and the reason for setting that state. These can also be applied as node-specific options.
`compute.maxUnavailable`	Sets the maximum unavailability of the compute nodes during a rolling update. Can be a percentage or a number.
`compute.ssh.enabled`	When enabled, the Slurm compute nodes will have SSH available.

Node-specific options

Many options are available for each named node definition. For reference, see my-node-def in the YAML above which shows many of the available options.

node.enabled: If true, then compute nodes should be deployed with this definition. Multiple definitions can be declared, but those only enabled: true will be deployed.
node.replicas: Specifies the desired number of Slurm nodes (Kubernetes Pods) of this type that the NodeSet will attempt to create. This is a maximum value, because the number of desired Pods can be greater than the number of available Pods. To change the number of replicas for a running Slurm cluster, use:

Example

$
kubectl scale nodeset <nodeset-name> --replicas=N

node.definitions: A list of other node definitions to include in this definition. See Custom Node Definitions below to learn how to create custom definitions.
node.staticFeatures: Static Slurm node feature flags. Feature flags are strings that Slurm will add to the Slurm nodes, where they are available for use when scheduling Slurm jobs. For example, to schedule a job only on nodes with the feature really-fast:

Example

$
srun -C really-fast hostname

Here's an example of how it looks within Slurm.

Example

NodeName=h100-092-02 Arch=x86_64 CoresPerSocket=32
           CPUAlloc=110 CPUEfctv=128 CPUTot=128 CPULoad=0.56
           AvailableFeatures=h100-pci4,pci-4,cu120,gpu,infiniband,sharp
           ActiveFeatures=h100-pci4,pci-4,cu120,gpu,infiniband,sharp

node.dynamicFeatures: Dynamic Slurm node features from Kubernetes' Node labels. This specifies a map of labels that should be used as additional feature flags within Slurm. Note: the value for each map key is {} as there is no further configuration at this time.
node.image: Specifies which Docker image repository used to pull this node's image.

node.env: Sets extra environment variables to be exposed in the compute nodes.
node.gresGpu: Sets the Slurm Generic Resource Scheduling value for the gpu GresType. This describes the type and number of GPU Generic Resources for this Slurm node type.
node.config: Add additional config options to the slurmd startup used during dynamic node registration. The features and gres options are already set. See Node Parameters and Node Configuration for more details on the options and values.
node.resources: Sets the Kubernetes Compute resource limits and requests.
node.affinity: Sets the Kubernetes Node affinities, ensuring that the node is scheduled with a specific GPU model.
node.initialState and node.initialStateReason: The initial State for the nodes when they join the Slurm cluster, generally drain or idle, and the reason for setting that state. These can also be applied as a general option for the cluster.
node.volumeMounts: Additional per node definition volumeMounts to add to the primary container, same format as compute.volumeMounts. Mounts that match on path will override those set at the higher level.
node.volumes: Additional per node definition volumes to add to the Pod, same format as compute.volumes. Volumes that match on name will override those set at the higher level.
node.containers: Additional per node containers (e.g. sidecars) to add to the Pod. Additional configuration for these containers (such as Secrets and ConfigMaps) must be in the Slurm namespace.
node.dnsPolicy: Adjust the dnsPolicy for each node.
node.dnsConfig: Adjust the dnsConfig for each node.

Custom node definitions

Node definitions can reference other node definitions to include or overlay values. These "layers" can be defined in the same values.yaml file, or in separate files.

As shown in the prior example, there is a node definition named reservation-id:

Example

compute:
  nodes:
    reservation-id:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node.coreweave.cloud/reserved
                    operator: In
                    values:
                      - <my-reservation-id>

That layer is included in the my-node-def definition.

Example

compute:
  nodes:
    my-node-def:
      definitions:
      - reservation-id

You can store custom layers as separate values files. Any key defined under compute.nodes can be used, even if that key is another file, by specifying multiple values files in a defined order on the command-line.

For example, consider a custom-compute-defs-values.yaml file that only has a compute.nodes section with custom layers defined. The values.yaml file can use those definitions as long as both value files are used when deploying, like so:

Example

$
helm install slurm coreweave/slurm -f custom-compute-defs-values.yaml -f values.yaml

Mixing CPU and GPU node types

You can mix multiple Slurm node types by defining multiple NodeSets in different blocks under compute.nodes. Each NodeSet can have its own resources and affinities that specify a single type of node.

For example, it's possible to create a NodeSet that selects a particular type of GPU, while another selects CPU-only nodes, and then deploy any desired number of each node type.

Access Slurm compute nodes​

The manifest​

Global options​

compute.volumeMounts​

compute.volumes​

compute.s6​

compute.pyxis​

compute.partitions​

Other global options​

Node-specific options​

Custom node definitions​

Mixing CPU and GPU node types​

Access Slurm compute nodes

The manifest

Global options

`compute.volumeMounts`

`compute.volumes`

`compute.s6`

`compute.pyxis`

`compute.partitions`

Other global options

Node-specific options

Custom node definitions

Mixing CPU and GPU node types