Skip to main content

Run SGLang

Learn how to run SGLang with SkyPilot

Using SGLang on CKS is easiest with SkyPilot, which provides a simple way to launch and manage distributed applications on Kubernetes clusters.

This guide shows you how to set up SGLand with SkyPilot on CKS by covering the following:

  • Installing and running SkyPilot
  • Running SGLang

Prerequisites

Before completing the steps in this guide, be sure your development environment meets the following requirements:

  • You have your kubeconfig file properly set and can interact with clusters and Pods using kubectl.

  • A environment variable set to HF_TOKEN with your Hugging Face token. For more information, see the Hugging Face instructions at User access tokens.

  • Authentication to use the meta-llama/Llama-3.1-8B-Instruct model. For more information, go to meta-llama/Llama-3.1-8B-Instruct and request access. Note that approval for restricted models can take a few hours or longer.

  • The socat and netcat networking utilities.

Options for installing SkyPilot

You can install SkyPilot using Anaconda or uv. The following instructions cover both techniques.

Install SkyPilot with Anaconda

Python development environment with Anaconda installed.

Run the following commands to install SkyPilot with Anaconda:

Example
$
conda create -y -n sky python=3.10
$
conda activate sky
$
pip install "skypilot[kubernetes]"

Once you see output stating that Kubernetes is enabled infra, you've successfully installed SkyPilot. For example, you might see output similar to the following:

🎉 Enabled infra 🎉
Kubernetes [compute]

Debugging steps: Follow any instructions for installing missing dependencies.

Install SkyPilot with uv

To install SkyPilot with uv, you might need to download and install uv first. Follow the instruction on uv GitHub repository.

After installing uv, run the following commands:

Example
$
uv venv --seed --python 3.10
$
uv pip install "skypilot[kubernetes]"

You should see output similar to the following:

🎉 Enabled infra 🎉
Kubernetes [compute]

Run SkyPilot

Run the following command to enable SkyPilot:

Example
$
sky check

Run a nccl test on Nodes with InifinBand enabled

If you have Nodes with InfiniBand enabled, you can run a nccl test by completing the following steps:

  1. Copy the following file and name it nccl-network-tier.yaml. Replace accelerators with your Node type (which corresponds to the gpu.nvidia.com/class property):

    Example
    name: nccl-network-tier
    resources:
    infra: k8s
    # Replace accelerators with your Node type that has InfiniBand enabled.
    accelerators: H100_NVLINK_80GB:8
    image_id: ghcr.io/coreweave/nccl-tests:12.8.1-devel-ubuntu22.04-nccl2.26.2-1-0708d2e
    network_tier: best # Automatically requests rdma/ib: 1 resource and sets env vars
    num_nodes: 2
    run: |
    if [ "${SKYPILOT_NODE_RANK}" == "0" ]; then
    echo "Head node"
    # Total number of processes, NP should be the total number of GPUs in the cluster
    NP=$(($SKYPILOT_NUM_GPUS_PER_NODE * $SKYPILOT_NUM_NODES))
    # Append :${SKYPILOT_NUM_GPUS_PER_NODE} to each IP as slots
    nodes=""
    for ip in $SKYPILOT_NODE_IPS; do
    nodes="${nodes}${ip}:${SKYPILOT_NUM_GPUS_PER_NODE},"
    done
    nodes=${nodes::-1}
    echo "All nodes: ${nodes}"
    mpirun \
    --allow-run-as-root \
    --tag-output \
    -H $nodes \
    -np $NP \
    -N $SKYPILOT_NUM_GPUS_PER_NODE \
    --bind-to none \
    -x PATH \
    -x LD_LIBRARY_PATH \
    -x NCCL_DEBUG=INFO \
    -x NCCL_SOCKET_IFNAME=eth0 \
    -x NCCL_IB_HCA \
    -x UCX_NET_DEVICES \
    -x SHARP_COLL_ENABLE_PCI_RELAXED_ORDERING=1 \
    -x NCCL_COLLNET_ENABLE=0 \
    /opt/nccl-tests/build/all_reduce_perf \
    -b 512M \
    -e 8G \
    -f 2 \
    -g 1 \
    -c 1 \
    -w 5 \
    -n 10
    else
    echo "Worker nodes"
    fi
  2. Run the following command:

    Example
    $
    sky launch -c nccl-test-sky nccl-network-tier.yaml

Run SGLang

To run SGLang on CKS, you can use the following example script.

Note that you need to complete the steps for installing and running SkyPilot as described above.

To run SGLang, complete the following steps:

  1. Copy the following YAML file and name it sglang.yaml. Be sure to replace the accelerators field with your GPU type:

    Example
    envs:
    HF_TOKEN: null
    resources:
    image_id: docker:lmsysorg/sglang:latest
    # Replace with your GPU type.
    accelerators: H100_NVLINK_80GB
    ports: 30000
    run: |
    conda deactivate
    python3 -m sglang.launch_server \
    --model-path meta-llama/Llama-3.1-8B-Instruct \
    --host 0.0.0.0 \
    --port 30000
  2. Run the following command:

    Example
    $
    HF_TOKEN=<hugging_face_token> sky launch -c sglang sglang.yaml --env HF_TOKEN