Run veRL on SUNK
Run Group Relative Policy Optimization (GRPO) training with veRL on SUNK
This guide shows you how to run Group Relative Policy Optimization (GRPO) training with veRL on SUNK using the Qwen3 8B model. The provided Slurm script handles container setup, dataset preparation, and Ray orchestration. It also writes logs and checkpoints to a run directory, and automatically logs to Weights & Biases if you provide a WANDB_API_KEY.
Prerequisites
To use this guide, you need the following:
- Access to a SUNK cluster
- One available GPU node, at minimum. We recommend using an H200 node. By default, the script requests 1 node with 8 GPUs. If using a smaller node, you will need to adjust the hyperparameters to reduce GPU Memory consumption.
- An NFS-backed working directory visible to all nodes (for data, checkpoints, container cache)
- Optionally, a
WANDB_API_KEYto log to Weights & Biases
This script uses the following defaults:
- veRL container tag:
app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2 - veRL commit:
8fdc4d3f202f41461f4de9f42a637228e342668b(v0.5.0)
Override via VERL_TAG and VERL_VERSION if needed.
Choose an NFS-backed working directory
Select a directory mounted on all nodes. In many SUNK clusters, your home directory will suffice.
Optionally, you can export overrides for data, checkpoints, and container cache locations. This guide uses the default values, with no overrides, when showing example commands. If you set custom paths, substitute them in the commands below where indicated.
$# Example in home directorymkdir -p ~/verl-experiments/qwen3-8b-grpocd ~/verl-experiments/qwen3-8b-grpo# Optional: override defaults# export DATA_DIR=/mnt/data/verl-experiments/data# export CHECKPOINT_DIR=/mnt/data/verl-experiments/checkpoints# export CONTAINER_DIR=/mnt/data/verl-experiments/containers
Export your W&B API key (Optional)
If you set a WANDB_API_KEY, the veRL job will log to Weights & Biases automatically. To set your API key, run the following commands:
$export WANDB_API_KEY=<your_wandb_api_key>$echo "export WANDB_API_KEY=$WANDB_API_KEY" >> ~/.bashrc
Create the batch script
Create the verl-grpo-qwen3-8B-gsm8k.sbatch script in your working directory:
$cat << 'EOF' > verl-grpo-qwen3-8B-gsm8k.sbatch#!/bin/bash####SBATCH --job-name=verl-grpo-qwen3-8B-gsm8k#SBATCH --nodes=1#SBATCH --ntasks-per-node=1#SBATCH --gpus-per-node=8#SBATCH --cpus-per-task=128#SBATCH --mem=512GB#SBATCH --time=10:00:00#SBATCH --output="logs/%x_%j.out" # Use %x for job name and %j for slurm job ID in output file name#SBATCH --exclusive# NCCL environment variables are documented at:# https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/env.html# Out Of Band is set to run over front-end Ethernet.# Backend is restricted to use ibp* interfaces to ensure it doesn't try to use any RoCE interfaces from the frontend.export NCCL_SOCKET_IFNAME=eth0export NCCL_IB_HCA=ibp# Disable UCX# Restrict the transport layer for UCX, it tries to use all transports by default, this forces it on TCP. NCCL does not use UCX at all.# We explicitly deactivate it to avoid initializing UCX by mistake as it can lead to crashes.export UCX_TLS=tcpexport UCX_NET_DEVICES=eth0export OMPI_MCA_coll_hcoll_enable=0export PMIX_MCA_gds='^ds12'# Define veRL container version we will use# See https://hub.docker.com/r/verlai/verl/tags for available tags.verl_tag="${VERL_TAG:-app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2}"verl_version="${VERL_VERSION:-8fdc4d3f202f41461f4de9f42a637228e342668b}" # v0.5.0log() {printf '[%s] %s\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$*"}ensure_nfs_dir() {local dir="$1"local error_message="$2"mkdir -p "$dir"local fstypefstype="$(stat '-fc%T' "$dir")"if [ "$fstype" != "nfs" ] ; thenlog "${error_message:-You must specify a directory that is mounted on all cluster nodes.}" >&2exit 1fi}# Define all the NFS directories we will useexport DATA_DIR="${DATA_DIR:-$(realpath -s data)}"export CHECKPOINT_DIR="${CHECKPOINT_DIR:-$(realpath -s checkpoints)}"export CONTAINER_DIR="${CONTAINER_DIR:-$(realpath -s images)}"export TMPDIR="/tmp"run_suffix="${RUN_SUFFIX:-${SLURM_JOB_ID:-$(date '+%Y%m%d_%H%M%S')}}"export RUN_DIR="${RUN_DIR:-$CHECKPOINT_DIR/run_$run_suffix}"export WANDB_DIR="${WANDB_DIR:-$RUN_DIR}"log "Run directory: $RUN_DIR"ensure_nfs_dir "$DATA_DIR" 'You must specify a data directory that is mounted on all cluster nodes.'ensure_nfs_dir "$CHECKPOINT_DIR" 'You must specify a checkpoint directory that is mounted on all cluster nodes.'ensure_nfs_dir "$CONTAINER_DIR" 'You must specify a container directory that is mounted on all cluster nodes.'ensure_nfs_dir "$RUN_DIR" 'You must specify a RUN_DIR that is mounted on all cluster nodes.'ensure_nfs_dir "$WANDB_DIR" 'You must specify a WANDB_DIR that is mounted on all cluster nodes.'# Pull the container image, if not already pulled. For large parallel jobs, this# will save time by not hitting the repository from each task. This will# be executed once on the head node of the allocation.# Will also clone and install the veRL package itself as that is not included in the container image.export CONTAINER_IMAGE="${CONTAINER_DIR}/${verl_tag}_${verl_version}.sqsh"if [ -f "$CONTAINER_IMAGE" ]; thenlog "Container image for veRL version $verl_tag already exists, no need to pull."elselog "Pulling container image for veRL version: $verl_tag and saving to $CONTAINER_IMAGE"srun --job-name=verl-image-pull \--container-image="docker://verlai/verl:$verl_tag" \--container-save="$CONTAINER_IMAGE" \bash -c "git clone https://github.com/volcengine/verl.git &&cd verl &&git checkout $verl_version &&pip install --no-deps 'click==8.1.7' 'typing_extensions>=4.14,<5' &&pip install -e . --no-deps" || {log "Failed to clone and install veRL package" >&2exit 1}fi# Log the assigned nodeslog "Using nodes: $SLURM_JOB_NODELIST"# Download and process the dataset if necessarymkdir -p "$DATA_DIR/gsm8k"if [ ! -f "$DATA_DIR/gsm8k/train.parquet" ] || [ ! -f "$DATA_DIR/gsm8k/test.parquet" ]; thenlog "Downloading and processing GSM8K dataset..."srun --job-name=gsm8k-preprocess --nodes=1 \--container-image="$CONTAINER_IMAGE" \--container-mounts="$DATA_DIR:$DATA_DIR" \python3 verl/examples/data_preprocess/gsm8k.py --local_dir "$DATA_DIR/gsm8k" || {log "Failed to download and process GSM8K dataset" >&2exit 1}elselog "Using existing GSM8K dataset in $DATA_DIR/gsm8k"fi# Initialize the Ray cluster that veRL will usenodes=$(scontrol show hostnames "$SLURM_JOB_NODELIST")nodes_array=($nodes)head_node=${nodes_array[0]}head_node_ip=$(srun --job-name=head-ip --nodes=1 --ntasks=1 -w "$head_node" hostname --ip-address)# If we detect a space character in the head node IP, we'll# convert it to an ipv4 address. This step is optional.if [[ "$head_node_ip" == *" "* ]]; thenIFS=' ' read -ra ADDR <<<"$head_node_ip"if [[ ${#ADDR[0]} -gt 16 ]]; thenhead_node_ip=${ADDR[1]}elsehead_node_ip=${ADDR[0]}filog "IPV6 address detected. We split the IPV4 address as $head_node_ip"fi# veRL expects this `RAY_ADDRESS` env var to be set when initializing the task runner.port=6379export RAY_ADDRESS="$head_node_ip:$port"log "Ray Address: $RAY_ADDRESS"# Make sure NFS paths are available, but also the tmp DIR# so that files created by the ray workers are available to the main# task runner.mounts="$DATA_DIR:$DATA_DIR,$CHECKPOINT_DIR:$CHECKPOINT_DIR,$TMPDIR:$TMPDIR"log "Starting HEAD at $head_node"srun --job-name=ray-head --nodes=1 --ntasks=1 -w "$head_node" \--container-image="$CONTAINER_IMAGE" \--container-mounts="$mounts" \ray start --head --node-ip-address="$head_node_ip" --port=$port \--include-dashboard=false \--num-cpus "${SLURM_CPUS_PER_TASK}" --num-gpus 8 --block &# Ensure the head node is ready before starting the workers.sleep 5# Number of nodes other than the head node.worker_num=$((SLURM_JOB_NUM_NODES - 1))for ((i = 1; i <= worker_num; i++)); donode_i=${nodes_array[$i]}log "Starting WORKER $i at $node_i"srun --job-name="ray-worker-$i" --nodes=1 --ntasks=1 -w "$node_i" \--container-image="$CONTAINER_IMAGE" \--container-mounts="$mounts" \ray start --address "$RAY_ADDRESS" \--include-dashboard=false \--num-cpus "${SLURM_CPUS_PER_TASK}" --num-gpus 8 --block &done# Ensure the workers are ready before running the training script.sleep 5# Run the GRPO training script after the Ray cluster is initializedepochs=1 # Reduce epochs for the tutoriallog "Running GRPO training script..."PYTHONUNBUFFERED=1 srun --job-name=grpo-training --kill-on-bad-exit=1 --overlap --nodes=1 -w "$head_node" \--container-image="$CONTAINER_IMAGE" \--container-mounts="$mounts" \bash verl/examples/grpo_trainer/run_qwen3-8b.sh \data.train_files="$DATA_DIR/gsm8k/train.parquet" \data.val_files="$DATA_DIR/gsm8k/test.parquet" \trainer.total_epochs="$epochs" \trainer.default_local_dir="$RUN_DIR"log "Stopping Ray cluster gracefully..."srun --job-name=ray-stop-all \--overlap \--nodes="$SLURM_JOB_NUM_NODES" \--ntasks="$SLURM_JOB_NUM_NODES" \--ntasks-per-node=1 \--container-image="$CONTAINER_IMAGE" \--container-mounts="$mounts" \ray stop || log "Ray stop failed on one or more nodes" >&2EOF
How the script works
Click here for a detailed explanation of how the script works
SUNK supports running Slurm jobs inside enroot containers via the Pyxis plugin. Rather than rebuild dependencies each time, the batch script saves a container image locally before launching the training job, or reuses one that already exists. The script uses srun with the --container-image flag pointing to a public veRL base image, already bundled with vLLM, SGLang, and Megatron. Since this container does not include veRL itself, the script clones the veRL repository at the pinned commit and installs the veRL package from source, including the Qwen3-8B training launch script used in this tutorial. The --container-save flag then saves the container image to a local NFS directory.
To prepare the dataset, a follow-up srun executes verl/examples/data_preprocess/gsm8k.py, which downloads the GSM8K dataset from Hugging Face and writes train.parquet and test.parquet into DATA_DIR/gsm8k so future runs can reuse the results without re-downloading.
The script then starts a Ray head on the first node and workers on the remaining nodes, and exports RAY_ADDRESS so veRL can attach. This pattern mirrors the guidance in the Run Ray on SUNK guide.
With the container cached and the dataset ready, the tutorial launches the Qwen3-8B GRPO script via srun on rank 0. veRL's trainer then uses the previously-created Ray cluster to orchestrate processes across the nodes. Config overrides to the trainer are passed as CLI arguments, including input/output paths and reducing the total number of epochs. After the training script is complete, it launches a final srun to gracefully tear down the Ray cluster.
Submit the job
After creating the script, submit the job to Slurm with sbatch, as follows:
$sbatch verl-grpo-qwen3-8B-gsm8k.sbatch
Once submitted, the job will perform the following steps:
- Pull and cache the veRL container
- Download and preprocess the
GSM8Kdataset, if missing - Start a Ray cluster inside the allocation
- Run GRPO training, with 1 epoch by default
- Write logs and checkpoints to your run directory
Monitor progress
Fetch the job ID
Fetch the Slurm job ID from squeue, as follows:
$export VERL_JOB_ID="$(squeue --user=$(whoami) --name=verl-grpo-qwen3-8B-gsm8k -h -o "%A" | head -n1)"
View the job status
Once you have the job ID, you can inspect the status of each step using sacct, as follows:
$sacct -j ${VERL_JOB_ID}
This will output the status of each srun step in the sbatch script. With sacct, each step is shown in its own row, in the following order:
- Container image creation
- Dataset processing
- Each ray node starting
- Training job script
- Cleaning up Ray
Stream runtime logs
To stream runtime logs, use tail as follows:
$tail -f "logs/verl-grpo-qwen3-8B-gsm8k_${VERL_JOB_ID}.out"
View the dataset
The GSM8K dataset is saved at the following path:
$ls -lah ./data/gsm8k
View the run directory artifacts
To view the run directory artifacts, use ls as follows:
$ls -lah ./checkpoints/run_${VERL_JOB_ID}
Before a checkpoint is written, wandb is likely the only directory you'll see listed.
Find the W&B link in logs
To find the W&B link in logs, use grep as follows:
$grep -E "View run at|View project at:" "logs/verl-grpo-qwen3-8B-gsm8k_${VERL_JOB_ID}.out"
It can take about 15 minutes for the job to reach the W&B initialization step. Once initialized, you'll see the W&B project and run URLs printed in the logs that resembles the following:
wandb: ⭐️ View project at https://wandb.ai/<user>/verl_grpo_example_gsm8kwandb: 🚀 View run at https://wandb.ai/<user>/verl_grpo_example_gsm8k/runs/nli58bea
Example outputs
Directories created
./images/app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2_8fdc4d3f202f41461f4de9f42a637228e342668b.sqsh./data/gsm8k/train.parquettest.parquet./logs/verl-grpo-qwen3-8B-gsm8k_${VERL_JOB_ID}.out./checkpoints/run_${VERL_JOB_ID}/ <-- RUN_DIR (primary outputs)wandb/ <-- Local W&B run files (if WANDB_API_KEY is set)global_step_7/ <-- Final step for this tutorialactor/model_world_size_8_rank_*.ptoptim_world_size_8_rank_*.ptextra_state_world_size_8_rank_*.pthuggingface/ <-- Saved model config + tokenizer
Sample log excerpt
[2025-11-14 21:06:47] Run directory: /mnt/home/<user>/verl-experiments/checkpoints/run_6237[2025-11-14 21:06:47] Using nodes: h200-204-169[2025-11-14 21:06:47] Using existing GSM8K dataset in /mnt/home/<user>/verl-experiments/data/gsm8k[2025-11-14 21:06:47] Ray Address: 10.0.5.165:6379[2025-11-14 21:06:48] Starting HEAD at h200-204-169[2025-11-14 21:06:58] Running GRPO training script......local_global_step_folder: /mnt/home/<user>/verl-experiments/checkpoints/run_6237/global_step_7INFO:2025-11-14 21:36:55,017:[Rank 7] Saved model to /mnt/home/<user>/verl-experiments/checkpoints/run_6237/global_step_7/actor/model_world_size_8_rank_7.ptINFO:2025-11-14 21:37:14,123:[Rank 1] Saved optim to /mnt/home/<user>/verl-experiments/checkpoints/run_6237/global_step_7/actor/optim_world_size_8_rank_1.ptINFO:2025-11-14 21:37:14,159:[Rank 1] Saved extra_state to /mnt/home/<user>/verl-experiments/checkpoints/run_6237/global_step_7/actor/extra_state_world_size_8_rank_1.ptINFO:2025-11-14 21:37:15,223:[Rank 0] Saved model config and tokenizer class to /mnt/home/<user>/verl-experiments/checkpoints/run_6237/global_step_7/actor/huggingface("Final validation metrics: {'val-core/openai/gsm8k/reward/mean@1': " '0.6588324488248674}')[2025-11-14 21:37:36] Stopping Ray cluster gracefully...SUCC scripts.py:1395 -- Stopped all 6 Ray processes.