> ## Documentation Index
> Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Run Ray with Kueue

> Install Ray and Kueue on CKS for distributed compute with efficient scheduling and job queuing

This guide shows you how to set up [Ray](https://docs.ray.io/en/latest/) with [Kueue](https://kueue.sigs.k8s.io/) on CKS. It covers the following:

* How to install Ray and Kueue
* Create Ray clusters that can queue jobs efficiently
* Use helper scripts to manage distributed computing workloads

This creates a distributed compute environment where Ray executes jobs, and Kueue ensures those jobs are efficiently scheduled and queued inside your CKS cluster.

## Prerequisites

Before beginning, we recommend creating a PVC for shared use. This can make it easier for ML engineers working together to maintain persistent directories across ephemeral or autoscaling clusters. For information on creating a PVC, see [Create PVCs](/products/storage/distributed-file-storage/manage-volumes#create-pvcs).

Note that if you want to connect to your Pods using SSH, you need to build your Ray image with an SSH sever.

## Install Ray

To install Ray, complete the following steps:

1. Run the following command to add the KubeRay Helm repo:

   ```bash theme={"system"}
   $ helm repo add kuberay https://ray-project.github.io/kuberay-helm/
   ```

   You should see output similar to the following:

   ```text theme={"system"}
   "kuberay" has been added to your repositories
   ```

2. Install KubeRay on your CKS cluster by running the following command:

   ```bash theme={"system"}
   $ helm install kuberay-operator kuberay/kuberay-operator --version 1.4.0
   ```

   You should see output similar to the following:

   ```text theme={"system"}
   NAME: kuberay-operator
   LAST DEPLOYED: Tue Aug 19 14:46:56 2025
   NAMESPACE: default
   STATUS: deployed
   REVISION: 1
   TEST SUITE: None
   ```

## Install Kueue

To install Kueue, follow the [Kueue documentation](/products/cks/clusters/coreweave-charts/kueue).

After installing Kueue, be sure to deploy the Kueue configuration file with `kubectl apply -f <kueue-config-file>`

<Info>
  Currently Kueue is configured to find the local queue in the default namespace. Work within the default namespace for this guide.
</Info>

## Create a Ray cluster

To make it easier to create and manage a Ray cluster, we recommend using the following script: [`create_ray_box`](https://github.com/coreweave/reference-architecture/blob/main/ray-kueue/scripts/create_ray_box).

To use the script, complete the following steps:

1. Download the script using curl:

   ```bash theme={"system"}
   $ curl -o create_ray_box https://raw.githubusercontent.com/coreweave/reference-architecture/main/ray-kueue/scripts/create_ray_box
   ```

2. Make the script executable:

   ```bash theme={"system"}
   $ chmod +x create_ray_box
   ```

3. Modify the lines related to GPUs, CPUs, and memory:

   ```text theme={"system"}
   # Set resource requests and limits for the Ray head node.
   resources:
     limits:
       # Modify CPU based on your the CPUs in your Node Pool.
       cpu: "120"

       # Modify memory based on your the memory in your Node Pool.
       memory: "2000G"

       # Modify the number of GPUs based the GPUs in your Node Pool.
       nvidia.com/gpu: 8
       rdma/ib: "1"
     requests:
       # For production use-cases, we recommend specifying integer CPU reqests and limits.
       # We also recommend setting requests equal to limits for both CPU and memory.

       # Modify CPU based on your the CPUs in your Node Pool.
       cpu: "120"

       # Modify memory based on your the memory in your Node Pool.
       memory: "2000G"

       # Modify the number of GPUs based the GPUs in your Node Pool.
       nvidia.com/gpu: 8
       rdma/ib: "1"
   ```

You can now run `./create_ray_box` without arguments to see its options.

### Examples

The following two examples show how to create a Ray cluster.

<Tabs>
  <Tab title="Using provided script">
    The following example creates a Ray dev box with a four worker Nodes and one head Node.

    ```bash theme={"system"}
    create_ray_box --nodes 4 --name mydevbox --image [RAY-DOCKER-IMAGE]
    ```

    Note the following:

    * **`--nodes`**: The number of Nodes is the number of workers (in addition to the head Node). This example creates a Ray dev box with a total of four worker Nodes and one head Node. All Nodes can run jobs. If there are insufficient resources, the cluster will be queued.

    * **`--image`**: This is the Ray Docker image. We highly recommend building a custom Ray image using CoreWeave's [nccl-tests](https://github.com/coreweave/nccl-tests) as a base image for [InfiniBand](/platform/instances/selecting-an-instance#the-role-of-nvlink-and-infiniband) support. If you don't include the `--image` option, the script uses the [rayproject/ray-ml:2.9.0-gpu](https://hub.docker.com/r/rayproject/ray-ml) image.
  </Tab>

  <Tab title="Using a YAML file">
    You can also create a Ray cluster with only YAML files:

    The following command creates a shared PVC. You don't need to run this command if you've already created a PVC.

    ```bash theme={"system"}
    kubectl apply -f https://raw.githubusercontent.com/coreweave/reference-architecture/refs/heads/main/ray-kueue/yamls/pvc.yaml
    ```

    The following command uses the [ray-cluster-sample](https://github.com/coreweave/reference-architecture/blob/main/ray-kueue/yamls/ray-cluster-sample.yaml) to create a Ray cluster:

    ```bash theme={"system"}
    kubectl apply -f https://raw.githubusercontent.com/coreweave/reference-architecture/refs/heads/main/ray-kueue/yamls/ray-cluster-sample.yaml
    ```
  </Tab>
</Tabs>

## Working with Ray Clusters

| Task                     | Command                                             | Notes                                                 |
| ------------------------ | --------------------------------------------------- | ----------------------------------------------------- |
| **List Ray clusters**    | `kubectl get raycluster`                            | Shows all clusters in the namespace.                  |
| **List Pods**            | `kubectl get pods`                                  | Lists the head and worker Pods.                       |
| **Log into head Pod**    | `kubectl exec -it <cluster-name>-head -- /bin/bash` | Replace `<cluster-name>` with your cluster's name.    |
| **Log into worker Pod**  | `kubectl exec -it <worker-pod-name> -- /bin/bash`   | Replace `<worker-pod-name>` with the actual Pod name. |
| **List jobs in queue**   | `kubectl get queue`                                 | View pending and admitted workloads.                  |
| **Get queue details**    | `kubectl describe queue`                            | Shows detailed information about the queue and jobs.  |
| **Delete a Ray Cluster** | `kubectl delete raycluster <cluster-name>`          | Replace `<cluster-name>` with your cluster's name     |
| **Shared storage path**  | `/mnt/vast`                                         | Where shared storage is configured.                   |

## Testing NCCL

To test that InfiniBand is correctly configured with your container and cluster, you can run the script [`nccl-test/all_reduce_ray.py`](https://raw.githubusercontent.com/coreweave/reference-architecture/refs/heads/main/ray-kueue/nccl-test/all_reduce_ray.py).

## Additional helper scripts

In the [CoreWeave/reference-architecture](https://github.com/coreweave/reference-architecture/tree/main) repository, you can find [helper scripts](https://github.com/coreweave/reference-architecture/tree/main/ray-kueue/scripts) for working with Ray clusters. These utility Python scripts make it easier to work with a small team on a Ray cluster. For example, to view the Ray cluster capacity, navigate to the `ray-kueue` directory in the reference architecture repository and run the following command:

```bash theme={"system"}
python3 scripts/capacity
```
