Run SGLang

Learn how to run SGLang with SkyPilot

Using SGLang on CKS is easiest with SkyPilot, which provides a simple way to launch and manage distributed applications on Kubernetes clusters.

This guide shows you how to set up SGLand with SkyPilot on CKS by covering the following:

Installing and running SkyPilot
Running SGLang

Prerequisites

Before completing the steps in this guide, be sure your development environment meets the following requirements:

You have your kubeconfig file properly set and can interact with clusters and Pods using kubectl.
A environment variable set to HF_TOKEN with your Hugging Face token. For more information, see the Hugging Face instructions at User access tokens.
Authentication to use the meta-llama/Llama-3.1-8B-Instruct model. For more information, go to meta-llama/Llama-3.1-8B-Instruct and request access. Note that approval for restricted models can take a few hours or longer.
The socat and netcat networking utilities.

Options for installing SkyPilot

You can install SkyPilot using Anaconda or uv. The following instructions cover both techniques.

Install SkyPilot with Anaconda

Python development environment with Anaconda installed.

Run the following commands to install SkyPilot with Anaconda:

Example

$
conda create -y -n sky python=3.10
$
conda activate sky
$
pip install "skypilot[kubernetes]"

Note

CKS requires SkyPilot version 0.10.1 or later.
SkyPilot requires Python 3.7 to 3.13.

Once you see output stating that Kubernetes is enabled infra, you've successfully installed SkyPilot. For example, you might see output similar to the following:

🎉 Enabled infra 🎉
  Kubernetes [compute]

Debugging steps: Follow any instructions for installing missing dependencies.

Install SkyPilot with uv

To install SkyPilot with uv, you might need to download and install uv first. Follow the instruction on uv GitHub repository.

After installing uv, run the following commands:

Example

$
uv venv --seed --python 3.10
$
uv pip install "skypilot[kubernetes]"

You should see output similar to the following:

🎉 Enabled infra 🎉
  Kubernetes [compute]

Run SkyPilot

Run the following command to enable SkyPilot:

Example

$
sky check

Run a `nccl` test on Nodes with InfiniBand enabled

If you have Nodes with InfiniBand enabled, you can run a nccl test by completing the following steps:

Copy the following file and name it nccl-network-tier.yaml. Replace accelerators with your Node type (which corresponds to the gpu.nvidia.com/class property):

Example

name: nccl-network-tier
resources:
infra: k8s
# Replace accelerators with your Node type that has InfiniBand enabled.
accelerators: H100_NVLINK_80GB:8
image_id: ghcr.io/coreweave/nccl-tests:12.8.1-devel-ubuntu22.04-nccl2.26.2-1-0708d2e
network_tier: best  # Automatically requests rdma/ib: 1 resource and sets env vars

num_nodes: 2

run: |
if [ "${SKYPILOT_NODE_RANK}" == "0" ]; then
echo "Head node"

    # Total number of processes, NP should be the total number of GPUs in the cluster
    NP=$(($SKYPILOT_NUM_GPUS_PER_NODE * $SKYPILOT_NUM_NODES))

    # Append :${SKYPILOT_NUM_GPUS_PER_NODE} to each IP as slots
    nodes=""
    for ip in $SKYPILOT_NODE_IPS; do
      nodes="${nodes}${ip}:${SKYPILOT_NUM_GPUS_PER_NODE},"
    done
    nodes=${nodes::-1}
    echo "All nodes: ${nodes}"

    mpirun \
      --allow-run-as-root \
      --tag-output \
      -H $nodes \
      -np $NP \
      -N $SKYPILOT_NUM_GPUS_PER_NODE \
      --bind-to none \
      -x PATH \
      -x LD_LIBRARY_PATH \
      -x NCCL_DEBUG=INFO \
      -x NCCL_SOCKET_IFNAME=eth0 \
      -x NCCL_IB_HCA \
      -x UCX_NET_DEVICES \
      -x SHARP_COLL_ENABLE_PCI_RELAXED_ORDERING=1 \
      -x NCCL_COLLNET_ENABLE=0 \
      /opt/nccl-tests/build/all_reduce_perf \
      -b 512M \
      -e 8G \
      -f 2 \
      -g 1 \
      -c 1 \
      -w 5 \
      -n 10
else
echo "Worker nodes"
fi

Run the following command:

Example

$
sky launch -c nccl-test-sky nccl-network-tier.yaml

Run SGLang

To run SGLang on CKS, you can use the following example script.

Note that you need to complete the steps for installing and running SkyPilot as described above.

To run SGLang, complete the following steps:

Copy the following YAML file and name it sglang.yaml. Be sure to replace the accelerators field with your GPU type:

Example

envs:
  HF_TOKEN: null

resources:
  image_id: docker:lmsysorg/sglang:latest
  # Replace with your GPU type.
  accelerators: H100_NVLINK_80GB
  ports: 30000

run: |
  conda deactivate
  python3 -m sglang.launch_server \
    --model-path meta-llama/Llama-3.1-8B-Instruct \
    --host 0.0.0.0 \
    --port 30000

Run the following command:

Example

$
HF_TOKEN=<hugging_face_token> sky launch -c sglang sglang.yaml --env HF_TOKEN

Prerequisites​

Options for installing SkyPilot​

Install SkyPilot with Anaconda​

Install SkyPilot with uv​

Run SkyPilot​

Run a nccl test on Nodes with InfiniBand enabled​

Run SGLang​

Prerequisites

Options for installing SkyPilot

Install SkyPilot with Anaconda

Install SkyPilot with uv

Run SkyPilot

Run a `nccl` test on Nodes with InfiniBand enabled

Run SGLang