Share storage across Slurm nodes
In Kubernetes, to ensure that data persists beyond the lifecycle of a given Pod, you can use Persistent Volume Claims (PVCs). A PVC is a request for storage by a user that can be provisioned from a Persistent Volume (PV). This mechanism allows you to abstract the details of how the storage is provided and how it's consumed.
In SUNK, each Slurm node is deployed in a Kubernetes Pod, which can mount shared PVCs in the normal manner. SUNK provides a mechanism to map the Kubernetes Pod's PVC to a specified mount location within the Slurm node. Multiple Slurm nodes can mount the same PVC. This is particularly useful when managing jobs that require sharing access between developers or researchers, storing user home directories, and saving job output for further processing.
Create shared storage
In this example, three PVCs are mounted to the Slurm compute nodes. To get started, create three PVCs in the cluster with a Container Storage Interface (CSI) driver that supports the ReadWriteMany
access mode. For more information on creating PVCs, see the Kubernetes documentation on Persistent Volumes.
Use the following names:
data-root
data-nvme
data-hdd
Mount PVCs
Mount the PVCs to the Slurm nodes by adding the volumeMounts
and volumes
keys in the compute
section of values.yaml
:
compute:volumeMounts:- name: root-nvme # root home dir, useful if not using LDAP and connecting with `kubectl exec`path: /root- name: data-nvme # Mount for high-speed storagepath: /mnt/nvme- name: data-hdd # Mount for high-capacity bulk storagepath: /mnt/hddvolumes:- name: root-nvme # This is useful if not using ldappersistentVolumeClaim:claimName: root-nvme- name: data-nvme # The high-speed storage PVCpersistentVolumeClaim:claimName: data-nvme- name: data-hdd # The high-capacity bulk storage PVCpersistentVolumeClaim:claimName: data-hdd
See compute.volumeMounts and compute.volumes in the Slurm Parameter Reference for a link to a full values.yaml
example.
Login node
For convenience, the login node automatically has any volumeMounts
and volumes
added to it that are specified for the compute nodes in values.yaml
. There is no need to specify these again for the login node.
Using shared storage
Once the PVCs are mounted, you can use them as you would any other storage. For example, you can create a directory in the PVC and use it to store job output, as shown below:
# On the login node$ mkdir /mnt/nvme/job-output# Mount is accessible on all compute nodes$ srun -N 1 -n 1 hostname > /mnt/nvme/job-output/$(hostname).txt
You can also use the PVCs to share data between users. For example, you can create a directory in the PVC and use it to store data that multiple users can access:
# On the login node$ mkdir /mnt/nvme/shared-data# Mount is accessible on all compute nodes$ srun -N 1 -n 1 hostname > /mnt/nvme/shared-data/$(hostname).txt
It's also useful to store user home directories in a PVC. This allows users to access their home directories from any compute node. In this example, the root home directory is mounted to the PVC named data-root
. This is useful if you're not using LDAP and connecting with kubectl exec
, and you have helper scripts or other files in the root home directory.