Run Ray on SUNK
Learn how to use Ray on SUNK
Using Ray on Slurm-on-Kubernetes (SUNK) provides Ray's Python native parallelism with Slurm's enterprise-level scheduler. By using Ray on SUNK, you let Slurm manage when and where jobs run, while Ray manages how distributed tasks are executed.
This document covers integrating Ray's distributed computing framework with SUNK's Slurm-based job scheduler and covers container preparation, cluster management, and interactive debugging.
Consider running Ray on SUNK if one of the following apply:
-
Most of your compute usage is through Slurm, and you need Ray to orchestrate workloads.
-
You want to support Ray and Slurm users from a single environment, using the Slurm scheduler for efficient resource allocation.
Prerequisites
Before completing the steps in the guide, be sure you have the following:
- Access to a SUNK cluster with permissions to submit jobs.
- Familiarity with Slurm commands, such
srun,sbatch, andsqueue.
Create a Ray cluster on Slurm
When working with containers on Slurm, you typically pull them as squash files to a shared directory in the Slurm login nodes. Within the shared directory, you can then execute the containers using multiple nodes without making simultaneous repository pulls.
There are two methods for pulling containers:
-
Pulling a Ray container and saving it as a squash file using
enroot. This is useful if you need to debug container errors. -
Pulling a container as a
sqshfile usingsrun. This is useful when your containers are well-tested and you only want to run jobs.
Pull a Ray container
To pull a Ray container, complete the following steps.
-
From the login node, create an interactive session on a GPU by running the following command:
(Note that you can also create a VS Code tunnel to the GPU.)
Example$srun -p NODE_TYPE --exclusive --pty bash -i- NODE_TYPE: Change this option based on your available resources.
Here we are requesting an h100 node:
Example with h100 node$srun -p h100 --exclusive --pty bash -iYou should now be in a session on a GPU node. Verify you are in a shared directory, such as your home directory or
/mnt/data. -
Import and start the container. Note that this can be any container, but here we'll use a
rayprojectnightly container:Example$enroot import docker://rayproject/ray:nightly-py39-gpu$enroot create rayproject+ray+nightly-py39-gpu.sqsh$enroot start --rw rayproject+ray+nightly-py39-gpuYou should see output similar to the following:
============ CUDA ============CUDA Version 12.1.1Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
You now have an interactive session running on a single node within the container. Because you used the –-rw flag to start the container, changes you make will be saved when you exit.
If you are using a container from ghcr.io, prefix it with docker://, for example, docker://ghcr.io/. If your container requires permissions, you will need to configure your enroot credentials. For more information about enroot, see the SUNK Training guide.
Conda environment
Instead of using containers, you can create a conda environment that includes your dependencies. However, this is not recommended. Ray expects the versions of Ray and Python used in the head and worker nodes to be identical to those used in the Ray script being run.
Conda environments with many users can be difficult to manage, but conda may be easier for testing and debugging when running Ray on Slurm.
If using conda, you need to initialize your conda environment (only once).
-
To initialize a conda environment, determine its PATH and initialize it:
Example$which condaUse the conda PATH in the next command. For example, if the PATH is the following:
/home/ray/anaconda3/bin/condaEnter the PATH below:
Example$/home/ray/anaconda3/bin/conda init bash$source ~/.bashrc -
Create an environment with the version of Ray that you want to use and any additional packages. Here the environment is called
rayenv:Example$conda create --name rayenv python=3.12 pip$conda activate rayenv$pip install -U "ray[data,train,tune,serve]"
You can now activate the same environment before you launch a Ray cluster or job.
Scripts for generating sbatch files
You can write a scripts to generate sbatch files for creating Ray clusters of the size and specifications you want. This lets you generate the right sbatch commands programmatically. For examples of these kinds of scripts, refer to the NERSC GitHub repository.
The following example script starts up a Ray cluster and also runs a job. If you want to start a Ray cluster and use it interactively, make sure the sbatchscript does not exit by putting a sleep inf command in the script. This keeps the script running for a specified time as an interactive development cluster. You can exit it earlier if you don't need it.
Note that the instructions use the variable SLURM_GPUS_PER_TASK. This is not a default Slurm variable. You need to set it. In the following examples, it's hardcoded to 8. Adjust the number of GPUs per task as you need.
To see available capacity, use the sinfo command. The partitions of different skus will typically be separated by name, for example, h100, h200.
$sinfo
The following is a sample sbatch script:
#!/bin/bash#SBATCH --job-name=raytest#SBATCH --nodes=2#SBATCH --exclusive#SBATCH --tasks-per-node=1 # we will launch one worker task per node#SBATCH --cpus-per-task=8 # each worker task gets 8 CPUs. Adjust as needed.#SBATCH --mem-per-cpu=1GB # each cpu gets 1GB of memory. Adjust as needed.#SBATCH --gpus-per-task=8 # each worker task will use 8 GPUs. Adjust as needed.#SBATCH --time=1-00:00:00 # specify a time limit of one day# Here we activate a conda environment named "rayenv" to load Ray# and its dependencies. This assumes that you have already created a# conda environment named "rayenv" with Ray installed.eval "$(conda shell.bash hook)"conda activate rayenv# Getting the node namesnodes=$(scontrol show hostnames "$SLURM_JOB_NODELIST")nodes_array=($nodes)head_node=${nodes_array[0]}head_node_ip=$(srun --nodes=1 --ntasks=1 -w "$head_node" hostname --ip-address)# If we detect a space character in the head node IP, we'll# convert it to an ipv4 address. This step is optional.if [[ "$head_node_ip" == *" "* ]]; thenIFS=' ' read -ra ADDR <<<"$head_node_ip"if [[ ${#ADDR[0]} -gt 16 ]]; thenhead_node_ip=${ADDR[1]}elsehead_node_ip=${ADDR[0]}fiecho "IPV6 address detected. We split the IPV4 address as $head_node_ip"fiport=6379ip_head=$head_node_ip:$portexport ip_headecho "IP Head: $ip_head"echo "Starting HEAD at $head_node"echo srun --nodes=1 --ntasks=1 -w "$head_node" \ray start --head --node-ip-address="$head_node_ip" --port=$port \--num-cpus "${SLURM_CPUS_PER_TASK}" --num-gpus 1 --blocksrun --nodes=1 --ntasks=1 -w "$head_node" \ray start --head --node-ip-address="$head_node_ip" --port=$port \--num-cpus "${SLURM_CPUS_PER_TASK}" --num-gpus 1 --block &# Optional, though may be useful in certain versions of Ray < 1.0.sleep 10# Number of nodes other than the head node.worker_num=$((SLURM_JOB_NUM_NODES - 1))for ((i = 1; i <= worker_num; i++)); donode_i=${nodes_array[$i]}echo "Starting WORKER $i at $node_i"srun --nodes=1 --ntasks=1 -w "$node_i" \ray start --address "$ip_head" \--num-cpus "${SLURM_CPUS_PER_TASK}" --num-gpus 1 --block &sleep 5done# Update --container-image and the Python script you want to run.# For an example, see the srun command after this example.srun -J "cont" --overlap --container-image=/mnt/home/USER/RAY_CONTAINER_SQUASH_FILE --container-remap-root --container-mounts=/mnt/home:/mnt/home python3 -u /mnt/home/USER/PYTHON_SCRIPT "$SLURM_CPUS_PER_TASK"
Example srun command for the sbatch script above:
srun -J "cont" --overlap \--container-image=/mnt/home/USER_NAME/rayproject+ray+nightly-py39-gpu.sqsh \--container-remap-root \--container-mounts=/mnt/home:/mnt/home \python3 -u /mnt/home/USER_NAME/tune_basic_example.py "$SLURM_CPUS_PER_TASK"
- USER_NAME: Replace with your username.
- rayproject+ray+nightly-py39-gpu.sqsh: This is the Ray container squash file we downloaded in the Pull a Ray container section.
- tune_basic_example.py: Create this file under your username with the contents found in the NERSC repository here.
Ray expects each person to be able to create their own Ray environments. Once you have created a Ray environment you can either interactively log in and launch jobs, or connect to a running container within that job to debug.
Pull a container as a sqsh file using srun
To pull a container as a sqsh file using srun, complete the following steps.
-
On a login node, execute a simple command—such as "echo hello"—on the development container. As an example, we use a
nccltest to pull and save the container.Example$srun --container-image=ghcr.io#coreweave/nccl-tests:12.8.1-devel-ubuntu22.04-nccl2.26.2-1-0708d2e \--container-remap-root --no-container-mount-home \--container-save ${HOME}/nccl-test.sqsh echo hello${HOME}/nccl-test.sqsh. Update${HOME}andnccl-test.sqshbased on where you want to save the file and what name to give it.
You should see output similar to the following:
pyxis: imported docker image: ghcr.io#coreweave/nccl-tests:12.8.1-devel-ubuntu22.04-nccl2.26.2-1-0708d2ehellopyxis: exported container pyxis_228.0 to /mnt/home/username/nccl-test.sqsh -
To pull the latest Ray container, run the following command:
Example$srun --container-image=rayproject/ray:nightly-py39-gpu \--container-save ${HOME}/ray-nightly-py39-gpu echo hello${HOME}/ray-nightly-py39-gpu. Update${HOME}andray-nightly-py39-gpubased on where you want to save the file and what name to give it.
You should see output similar to the following:
pyxis: importing docker image: rayproject/ray:nightly-py39-gpupyxis: imported docker image: rayproject/ray:nightly-py39-gpuhellopyxis: exported container pyxis_229.0 to /mnt/home/username/ray-nightly-py39-gpu
Interactively modify a container using srun
You can interactively modify a container using srun by mounting the home directory explicitly and saving the container. You can then install Ray inside the container:
$srun --container-image=${HOME}/nccl-test.sqsh \--container-remap-root --container-mounts=/mnt/home:/mnt/home \--container-save ${HOME}/nccl-test-new.sqsh --pty bash -i
${HOME}/nccl-test.sqsh. Update${HOME}andnccl-test.sqshbased on where you saved the container image from the previous steps.
When you exit the container, nccl-test-new.sqsh is saved.
Work with Ray on Slurm
Once your Ray environment is running, you can either connect interactively to the running Ray cluster for development or attach to a containerized Ray job to debug or inspect it. See instructions for each option below.
Interactively log in to a Ray environment
You can run the script by submitting the sbatch script. Note that the script includes the command to sleep an environment.
-
Use the example script above and save it as
ray.devcluster.batch, and then run the following command:Example$sbatch ray.devcluster.batch -
Your Ray environment will be a job. You can find out what job it is by running the following command:
ExamplesqueueYou should see similar output like the following:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)1335 hpc-low raytest user_name R 15:46 2 slurm-h100-231-[147,217] -
Above we can see that the
JOBIDis1335. This is a mini Ray environment with two h100 nodes. To connect to it and begin running things interactively, you would run the following command:Example$srun --jobid=1335 --overlap --pty bash -ijobid: Be sure to replace thejobidflag with the job you want to run.--overlap: The overlap flag specifies that you should attach to the head node of the specified job, herejobid=1335.
Attach to a Ray job using a container
If you launch a job using a container, you can connect to that container by providing the container name, as opposed to sleeping in the Ray environment for a job recently launched.
To attach to a Ray job using a container, complete the following steps.
-
Run the
squeuecommand to figure out what job it is:Example$squeue -
Find the specific step that is running the container using
sacct.sacctis a Slurm command that gives more information about the "steps" of the job. Steps are tasks launched within the allocation that use some or all of the resources.Examplesacct -j JOB_ID- JOB_ID: Replace with your
jobid.
- JOB_ID: Replace with your
-
This will start the container in the same allocation, so you can debug the run interactively. You want to be able to connect with the same container to a specific step.
Examplesrun --overlap --container-image=/mnt/data/coreweave/ray/nccl-test-new.sqsh --jobid 1336.3 --pty bash -i--container-image: Replace with the location of your container image.--jobid: Replace with yourjobid.
In the example script, we gave it a name using the
-Jflag in thesruncommand. This makes it easier to find. Note this will take a minute because first it is launching all the Ray daemons.There will be multiple nodes in the cluster but you can only
sshinto one. The node that it will pick by default will be the head node of the allocation, which will be where you can runray status. -
To connect to a specific node in your Ray environment, provide it on the command line with the
--nodelistflag.Examplesrun --overlap --nodelist=slurm-h100-227-211 \--container-image=/mnt/data/coreweave/ray/nccl-test-new.sqsh \--jobid 1336.3 --pty bash -i--nodelist: Replace with your node.--container-image: Replace with the location of your container image.--jobid: Replace with yourjobid.
Stop a Ray environment
To delete a job, find the job id using squeue and cancel it using the following command:
scancel JOB_ID
JOB_ID: Replace with the job you want to delete.