Skip to main content

Share storage across Slurm nodes

In Kubernetes, to ensure that data persists beyond the lifecycle of a given Pod, you can use Persistent Volume Claims (PVCs). A PVC is a request for storage by a user that can be provisioned from a Persistent Volume (PV). This mechanism allows you to abstract the details of how the storage is provided and how it's consumed.

In SUNK, each Slurm node is deployed in a Kubernetes Pod, which can mount shared PVCs in the normal manner. SUNK provides a mechanism to map the Kubernetes Pod's PVC to a specified mount location within the Slurm node. Multiple Slurm nodes can mount the same PVC. This is particularly useful when managing jobs that require sharing access between developers or researchers, storing user home directories, and saving job output for further processing.

Create shared storage

In this example, three PVCs are mounted to the Slurm compute nodes. To get started, create three PVCs in the cluster with a Container Storage Interface (CSI) driver that supports the ReadWriteMany access mode. For more information on creating PVCs, see the Kubernetes documentation on Persistent Volumes.

Use the following names:

  • data-root
  • data-nvme
  • data-hdd

Mount PVCs

Mount the PVCs to the Slurm nodes by adding the volumeMounts and volumes keys in the compute section of values.yaml:

Example
compute:
volumeMounts:
- name: root-nvme # root home dir, useful if not using LDAP and connecting with `kubectl exec`
path: /root
- name: data-nvme # Mount for high-speed storage
path: /mnt/nvme
- name: data-hdd # Mount for high-capacity bulk storage
path: /mnt/hdd
volumes:
- name: root-nvme # This is useful if not using ldap
persistentVolumeClaim:
claimName: root-nvme
- name: data-nvme # The high-speed storage PVC
persistentVolumeClaim:
claimName: data-nvme
- name: data-hdd # The high-capacity bulk storage PVC
persistentVolumeClaim:
claimName: data-hdd

See compute.volumeMounts and compute.volumes in the Slurm Parameter Reference for a link to a full values.yaml example.

Login node

For convenience, the login node automatically has any volumeMounts and volumes added to it that are specified for the compute nodes in values.yaml. There is no need to specify these again for the login node.

Using shared storage

Once the PVCs are mounted, you can use them as you would any other storage. For example, you can create a directory in the PVC and use it to store job output, as shown below:

Example
# On the login node
$ mkdir /mnt/nvme/job-output
# Mount is accessible on all compute nodes
$ srun -N 1 -n 1 hostname > /mnt/nvme/job-output/$(hostname).txt

You can also use the PVCs to share data between users. For example, you can create a directory in the PVC and use it to store data that multiple users can access:

Example
# On the login node
$ mkdir /mnt/nvme/shared-data
# Mount is accessible on all compute nodes
$ srun -N 1 -n 1 hostname > /mnt/nvme/shared-data/$(hostname).txt

It's also useful to store user home directories in a PVC. This allows users to access their home directories from any compute node. In this example, the root home directory is mounted to the PVC named data-root. This is useful if you're not using LDAP and connecting with kubectl exec, and you have helper scripts or other files in the root home directory.