Skip to main content

Run veRL on SUNK

Run Group Relative Policy Optimization (GRPO) training with veRL on SUNK

This guide shows you how to run Group Relative Policy Optimization (GRPO) training with veRL on SUNK using the Qwen3 8B model. The provided Slurm script handles container setup, dataset preparation, and Ray orchestration. It also writes logs and checkpoints to a run directory, and automatically logs to Weights & Biases if you provide a WANDB_API_KEY.

Prerequisites

To use this guide, you need the following:

  • Access to a SUNK cluster
  • One available GPU node, at minimum. We recommend using an H200 node. By default, the script requests 1 node with 8 GPUs. If using a smaller node, you will need to adjust the hyperparameters to reduce GPU Memory consumption.
  • An NFS-backed working directory visible to all nodes (for data, checkpoints, container cache)
  • Optionally, a WANDB_API_KEY to log to Weights & Biases
Tested version

This script uses the following defaults:

  • veRL container tag: app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2
  • veRL commit: 8fdc4d3f202f41461f4de9f42a637228e342668b (v0.5.0)

Override via VERL_TAG and VERL_VERSION if needed.

Choose an NFS-backed working directory

Select a directory mounted on all nodes. In many SUNK clusters, your home directory will suffice.

Optionally, you can export overrides for data, checkpoints, and container cache locations. This guide uses the default values, with no overrides, when showing example commands. If you set custom paths, substitute them in the commands below where indicated.

Example
$
# Example in home directory
mkdir -p ~/verl-experiments/qwen3-8b-grpo
cd ~/verl-experiments/qwen3-8b-grpo
# Optional: override defaults
# export DATA_DIR=/mnt/data/verl-experiments/data
# export CHECKPOINT_DIR=/mnt/data/verl-experiments/checkpoints
# export CONTAINER_DIR=/mnt/data/verl-experiments/containers

Export your W&B API key (Optional)

If you set a WANDB_API_KEY, the veRL job will log to Weights & Biases automatically. To set your API key, run the following commands:

Example
$
export WANDB_API_KEY=<your_wandb_api_key>
$
echo "export WANDB_API_KEY=$WANDB_API_KEY" >> ~/.bashrc

Create the batch script

Create the verl-grpo-qwen3-8B-gsm8k.sbatch script in your working directory:

Example
$
cat << 'EOF' > verl-grpo-qwen3-8B-gsm8k.sbatch
#!/bin/bash
###
#SBATCH --job-name=verl-grpo-qwen3-8B-gsm8k
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gpus-per-node=8
#SBATCH --cpus-per-task=128
#SBATCH --mem=512GB
#SBATCH --time=10:00:00
#SBATCH --output="logs/%x_%j.out" # Use %x for job name and %j for slurm job ID in output file name
#SBATCH --exclusive
# NCCL environment variables are documented at:
# https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/env.html
# Out Of Band is set to run over front-end Ethernet.
# Backend is restricted to use ibp* interfaces to ensure it doesn't try to use any RoCE interfaces from the frontend.
export NCCL_SOCKET_IFNAME=eth0
export NCCL_IB_HCA=ibp
# Disable UCX
# Restrict the transport layer for UCX, it tries to use all transports by default, this forces it on TCP. NCCL does not use UCX at all.
# We explicitly deactivate it to avoid initializing UCX by mistake as it can lead to crashes.
export UCX_TLS=tcp
export UCX_NET_DEVICES=eth0
export OMPI_MCA_coll_hcoll_enable=0
export PMIX_MCA_gds='^ds12'
# Define veRL container version we will use
# See https://hub.docker.com/r/verlai/verl/tags for available tags.
verl_tag="${VERL_TAG:-app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2}"
verl_version="${VERL_VERSION:-8fdc4d3f202f41461f4de9f42a637228e342668b}" # v0.5.0
log() {
printf '[%s] %s\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$*"
}
ensure_nfs_dir() {
local dir="$1"
local error_message="$2"
mkdir -p "$dir"
local fstype
fstype="$(stat '-fc%T' "$dir")"
if [ "$fstype" != "nfs" ] ; then
log "${error_message:-You must specify a directory that is mounted on all cluster nodes.}" >&2
exit 1
fi
}
# Define all the NFS directories we will use
export DATA_DIR="${DATA_DIR:-$(realpath -s data)}"
export CHECKPOINT_DIR="${CHECKPOINT_DIR:-$(realpath -s checkpoints)}"
export CONTAINER_DIR="${CONTAINER_DIR:-$(realpath -s images)}"
export TMPDIR="/tmp"
run_suffix="${RUN_SUFFIX:-${SLURM_JOB_ID:-$(date '+%Y%m%d_%H%M%S')}}"
export RUN_DIR="${RUN_DIR:-$CHECKPOINT_DIR/run_$run_suffix}"
export WANDB_DIR="${WANDB_DIR:-$RUN_DIR}"
log "Run directory: $RUN_DIR"
ensure_nfs_dir "$DATA_DIR" 'You must specify a data directory that is mounted on all cluster nodes.'
ensure_nfs_dir "$CHECKPOINT_DIR" 'You must specify a checkpoint directory that is mounted on all cluster nodes.'
ensure_nfs_dir "$CONTAINER_DIR" 'You must specify a container directory that is mounted on all cluster nodes.'
ensure_nfs_dir "$RUN_DIR" 'You must specify a RUN_DIR that is mounted on all cluster nodes.'
ensure_nfs_dir "$WANDB_DIR" 'You must specify a WANDB_DIR that is mounted on all cluster nodes.'
# Pull the container image, if not already pulled. For large parallel jobs, this
# will save time by not hitting the repository from each task. This will
# be executed once on the head node of the allocation.
# Will also clone and install the veRL package itself as that is not included in the container image.
export CONTAINER_IMAGE="${CONTAINER_DIR}/${verl_tag}_${verl_version}.sqsh"
if [ -f "$CONTAINER_IMAGE" ]; then
log "Container image for veRL version $verl_tag already exists, no need to pull."
else
log "Pulling container image for veRL version: $verl_tag and saving to $CONTAINER_IMAGE"
srun --job-name=verl-image-pull \
--container-image="docker://verlai/verl:$verl_tag" \
--container-save="$CONTAINER_IMAGE" \
bash -c "
git clone https://github.com/volcengine/verl.git &&
cd verl &&
git checkout $verl_version &&
pip install --no-deps 'click==8.1.7' 'typing_extensions>=4.14,<5' &&
pip install -e . --no-deps" || {
log "Failed to clone and install veRL package" >&2
exit 1
}
fi
# Log the assigned nodes
log "Using nodes: $SLURM_JOB_NODELIST"
# Download and process the dataset if necessary
mkdir -p "$DATA_DIR/gsm8k"
if [ ! -f "$DATA_DIR/gsm8k/train.parquet" ] || [ ! -f "$DATA_DIR/gsm8k/test.parquet" ]; then
log "Downloading and processing GSM8K dataset..."
srun --job-name=gsm8k-preprocess --nodes=1 \
--container-image="$CONTAINER_IMAGE" \
--container-mounts="$DATA_DIR:$DATA_DIR" \
python3 verl/examples/data_preprocess/gsm8k.py --local_dir "$DATA_DIR/gsm8k" || {
log "Failed to download and process GSM8K dataset" >&2
exit 1
}
else
log "Using existing GSM8K dataset in $DATA_DIR/gsm8k"
fi
# Initialize the Ray cluster that veRL will use
nodes=$(scontrol show hostnames "$SLURM_JOB_NODELIST")
nodes_array=($nodes)
head_node=${nodes_array[0]}
head_node_ip=$(srun --job-name=head-ip --nodes=1 --ntasks=1 -w "$head_node" hostname --ip-address)
# If we detect a space character in the head node IP, we'll
# convert it to an ipv4 address. This step is optional.
if [[ "$head_node_ip" == *" "* ]]; then
IFS=' ' read -ra ADDR <<<"$head_node_ip"
if [[ ${#ADDR[0]} -gt 16 ]]; then
head_node_ip=${ADDR[1]}
else
head_node_ip=${ADDR[0]}
fi
log "IPV6 address detected. We split the IPV4 address as $head_node_ip"
fi
# veRL expects this `RAY_ADDRESS` env var to be set when initializing the task runner.
port=6379
export RAY_ADDRESS="$head_node_ip:$port"
log "Ray Address: $RAY_ADDRESS"
# Make sure NFS paths are available, but also the tmp DIR
# so that files created by the ray workers are available to the main
# task runner.
mounts="$DATA_DIR:$DATA_DIR,$CHECKPOINT_DIR:$CHECKPOINT_DIR,$TMPDIR:$TMPDIR"
log "Starting HEAD at $head_node"
srun --job-name=ray-head --nodes=1 --ntasks=1 -w "$head_node" \
--container-image="$CONTAINER_IMAGE" \
--container-mounts="$mounts" \
ray start --head --node-ip-address="$head_node_ip" --port=$port \
--include-dashboard=false \
--num-cpus "${SLURM_CPUS_PER_TASK}" --num-gpus 8 --block &
# Ensure the head node is ready before starting the workers.
sleep 5
# Number of nodes other than the head node.
worker_num=$((SLURM_JOB_NUM_NODES - 1))
for ((i = 1; i <= worker_num; i++)); do
node_i=${nodes_array[$i]}
log "Starting WORKER $i at $node_i"
srun --job-name="ray-worker-$i" --nodes=1 --ntasks=1 -w "$node_i" \
--container-image="$CONTAINER_IMAGE" \
--container-mounts="$mounts" \
ray start --address "$RAY_ADDRESS" \
--include-dashboard=false \
--num-cpus "${SLURM_CPUS_PER_TASK}" --num-gpus 8 --block &
done
# Ensure the workers are ready before running the training script.
sleep 5
# Run the GRPO training script after the Ray cluster is initialized
epochs=1 # Reduce epochs for the tutorial
log "Running GRPO training script..."
PYTHONUNBUFFERED=1 srun --job-name=grpo-training --kill-on-bad-exit=1 --overlap --nodes=1 -w "$head_node" \
--container-image="$CONTAINER_IMAGE" \
--container-mounts="$mounts" \
bash verl/examples/grpo_trainer/run_qwen3-8b.sh \
data.train_files="$DATA_DIR/gsm8k/train.parquet" \
data.val_files="$DATA_DIR/gsm8k/test.parquet" \
trainer.total_epochs="$epochs" \
trainer.default_local_dir="$RUN_DIR"
log "Stopping Ray cluster gracefully..."
srun --job-name=ray-stop-all \
--overlap \
--nodes="$SLURM_JOB_NUM_NODES" \
--ntasks="$SLURM_JOB_NUM_NODES" \
--ntasks-per-node=1 \
--container-image="$CONTAINER_IMAGE" \
--container-mounts="$mounts" \
ray stop || log "Ray stop failed on one or more nodes" >&2
EOF

How the script works

Click here for a detailed explanation of how the script works

SUNK supports running Slurm jobs inside enroot containers via the Pyxis plugin. Rather than rebuild dependencies each time, the batch script saves a container image locally before launching the training job, or reuses one that already exists. The script uses srun with the --container-image flag pointing to a public veRL base image, already bundled with vLLM, SGLang, and Megatron. Since this container does not include veRL itself, the script clones the veRL repository at the pinned commit and installs the veRL package from source, including the Qwen3-8B training launch script used in this tutorial. The --container-save flag then saves the container image to a local NFS directory.

To prepare the dataset, a follow-up srun executes verl/examples/data_preprocess/gsm8k.py, which downloads the GSM8K dataset from Hugging Face and writes train.parquet and test.parquet into DATA_DIR/gsm8k so future runs can reuse the results without re-downloading.

The script then starts a Ray head on the first node and workers on the remaining nodes, and exports RAY_ADDRESS so veRL can attach. This pattern mirrors the guidance in the Run Ray on SUNK guide.

With the container cached and the dataset ready, the tutorial launches the Qwen3-8B GRPO script via srun on rank 0. veRL's trainer then uses the previously-created Ray cluster to orchestrate processes across the nodes. Config overrides to the trainer are passed as CLI arguments, including input/output paths and reducing the total number of epochs. After the training script is complete, it launches a final srun to gracefully tear down the Ray cluster.

Submit the job

After creating the script, submit the job to Slurm with sbatch, as follows:

Example
$
sbatch verl-grpo-qwen3-8B-gsm8k.sbatch

Once submitted, the job will perform the following steps:

  1. Pull and cache the veRL container
  2. Download and preprocess the GSM8K dataset, if missing
  3. Start a Ray cluster inside the allocation
  4. Run GRPO training, with 1 epoch by default
  5. Write logs and checkpoints to your run directory

Monitor progress

Fetch the job ID

Fetch the Slurm job ID from squeue, as follows:

Example
$
export VERL_JOB_ID="$(squeue --user=$(whoami) --name=verl-grpo-qwen3-8B-gsm8k -h -o "%A" | head -n1)"

View the job status

Once you have the job ID, you can inspect the status of each step using sacct, as follows:

Example
$
sacct -j ${VERL_JOB_ID}

This will output the status of each srun step in the sbatch script. With sacct, each step is shown in its own row, in the following order:

  1. Container image creation
  2. Dataset processing
  3. Each ray node starting
  4. Training job script
  5. Cleaning up Ray

Stream runtime logs

To stream runtime logs, use tail as follows:

Example
$
tail -f "logs/verl-grpo-qwen3-8B-gsm8k_${VERL_JOB_ID}.out"

View the dataset

The GSM8K dataset is saved at the following path:

Example
$
ls -lah ./data/gsm8k

View the run directory artifacts

To view the run directory artifacts, use ls as follows:

Example
$
ls -lah ./checkpoints/run_${VERL_JOB_ID}

Before a checkpoint is written, wandb is likely the only directory you'll see listed.

To find the W&B link in logs, use grep as follows:

Example
$
grep -E "View run at|View project at:" "logs/verl-grpo-qwen3-8B-gsm8k_${VERL_JOB_ID}.out"

It can take about 15 minutes for the job to reach the W&B initialization step. Once initialized, you'll see the W&B project and run URLs printed in the logs that resembles the following:

wandb: ⭐️ View project at https://wandb.ai/<user>/verl_grpo_example_gsm8k
wandb: 🚀 View run at https://wandb.ai/<user>/verl_grpo_example_gsm8k/runs/nli58bea

Example outputs

Directories created

./images/
app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2_8fdc4d3f202f41461f4de9f42a637228e342668b.sqsh
./data/
gsm8k/
train.parquet
test.parquet
./logs/
verl-grpo-qwen3-8B-gsm8k_${VERL_JOB_ID}.out
./checkpoints/
run_${VERL_JOB_ID}/ <-- RUN_DIR (primary outputs)
wandb/ <-- Local W&B run files (if WANDB_API_KEY is set)
global_step_7/ <-- Final step for this tutorial
actor/
model_world_size_8_rank_*.pt
optim_world_size_8_rank_*.pt
extra_state_world_size_8_rank_*.pt
huggingface/ <-- Saved model config + tokenizer

Sample log excerpt

[2025-11-14 21:06:47] Run directory: /mnt/home/<user>/verl-experiments/checkpoints/run_6237
[2025-11-14 21:06:47] Using nodes: h200-204-169
[2025-11-14 21:06:47] Using existing GSM8K dataset in /mnt/home/<user>/verl-experiments/data/gsm8k
[2025-11-14 21:06:47] Ray Address: 10.0.5.165:6379
[2025-11-14 21:06:48] Starting HEAD at h200-204-169
[2025-11-14 21:06:58] Running GRPO training script...
...
local_global_step_folder: /mnt/home/<user>/verl-experiments/checkpoints/run_6237/global_step_7
INFO:2025-11-14 21:36:55,017:[Rank 7] Saved model to /mnt/home/<user>/verl-experiments/checkpoints/run_6237/global_step_7/actor/model_world_size_8_rank_7.pt
INFO:2025-11-14 21:37:14,123:[Rank 1] Saved optim to /mnt/home/<user>/verl-experiments/checkpoints/run_6237/global_step_7/actor/optim_world_size_8_rank_1.pt
INFO:2025-11-14 21:37:14,159:[Rank 1] Saved extra_state to /mnt/home/<user>/verl-experiments/checkpoints/run_6237/global_step_7/actor/extra_state_world_size_8_rank_1.pt
INFO:2025-11-14 21:37:15,223:[Rank 0] Saved model config and tokenizer class to /mnt/home/<user>/verl-experiments/checkpoints/run_6237/global_step_7/actor/huggingface
("Final validation metrics: {'val-core/openai/gsm8k/reward/mean@1': " '0.6588324488248674}')
[2025-11-14 21:37:36] Stopping Ray cluster gracefully...
SUCC scripts.py:1395 -- Stopped all 6 Ray processes.