Run SGLang
Learn how to run SGLang with SkyPilot
Using SGLang on CKS is easiest with SkyPilot, which provides a simple way to launch and manage distributed applications on Kubernetes clusters.
This guide shows you how to set up SGLand with SkyPilot on CKS by covering the following:
- Installing and running SkyPilot
- Running SGLang
Prerequisites
Before completing the steps in this guide, be sure your development environment meets the following requirements:
-
You have your
kubeconfig
file properly set and can interact with clusters and Pods usingkubectl
. -
A environment variable set to
HF_TOKEN
with your Hugging Face token. For more information, see the Hugging Face instructions at User access tokens. -
Authentication to use the
meta-llama/Llama-3.1-8B-Instruct
model. For more information, go to meta-llama/Llama-3.1-8B-Instruct and request access. Note that approval for restricted models can take a few hours or longer. -
The
socat
andnetcat
networking utilities.
Options for installing SkyPilot
You can install SkyPilot using Anaconda or uv. The following instructions cover both techniques.
Install SkyPilot with Anaconda
Python development environment with Anaconda installed.
Run the following commands to install SkyPilot with Anaconda:
$conda create -y -n sky python=3.10$conda activate sky$pip install "skypilot[kubernetes]"
Once you see output stating that Kubernetes is enabled infra, you've successfully installed SkyPilot. For example, you might see output similar to the following:
🎉 Enabled infra 🎉Kubernetes [compute]
Debugging steps: Follow any instructions for installing missing dependencies.
Install SkyPilot with uv
To install SkyPilot with uv, you might need to download and install uv first. Follow the instruction on uv GitHub repository.
After installing uv, run the following commands:
$uv venv --seed --python 3.10$uv pip install "skypilot[kubernetes]"
You should see output similar to the following:
🎉 Enabled infra 🎉Kubernetes [compute]
Run SkyPilot
Run the following command to enable SkyPilot:
$sky check
Run a nccl
test on Nodes with InifinBand enabled
If you have Nodes with InfiniBand enabled, you can run a nccl
test by completing the following steps:
-
Copy the following file and name it
nccl-network-tier.yaml
. Replaceaccelerators
with your Node type (which corresponds to thegpu.nvidia.com/class
property):Examplename: nccl-network-tierresources:infra: k8s# Replace accelerators with your Node type that has InfiniBand enabled.accelerators: H100_NVLINK_80GB:8image_id: ghcr.io/coreweave/nccl-tests:12.8.1-devel-ubuntu22.04-nccl2.26.2-1-0708d2enetwork_tier: best # Automatically requests rdma/ib: 1 resource and sets env varsnum_nodes: 2run: |if [ "${SKYPILOT_NODE_RANK}" == "0" ]; thenecho "Head node"# Total number of processes, NP should be the total number of GPUs in the clusterNP=$(($SKYPILOT_NUM_GPUS_PER_NODE * $SKYPILOT_NUM_NODES))# Append :${SKYPILOT_NUM_GPUS_PER_NODE} to each IP as slotsnodes=""for ip in $SKYPILOT_NODE_IPS; donodes="${nodes}${ip}:${SKYPILOT_NUM_GPUS_PER_NODE},"donenodes=${nodes::-1}echo "All nodes: ${nodes}"mpirun \--allow-run-as-root \--tag-output \-H $nodes \-np $NP \-N $SKYPILOT_NUM_GPUS_PER_NODE \--bind-to none \-x PATH \-x LD_LIBRARY_PATH \-x NCCL_DEBUG=INFO \-x NCCL_SOCKET_IFNAME=eth0 \-x NCCL_IB_HCA \-x UCX_NET_DEVICES \-x SHARP_COLL_ENABLE_PCI_RELAXED_ORDERING=1 \-x NCCL_COLLNET_ENABLE=0 \/opt/nccl-tests/build/all_reduce_perf \-b 512M \-e 8G \-f 2 \-g 1 \-c 1 \-w 5 \-n 10elseecho "Worker nodes"fi -
Run the following command:
Example$sky launch -c nccl-test-sky nccl-network-tier.yaml
Run SGLang
To run SGLang on CKS, you can use the following example script.
Note that you need to complete the steps for installing and running SkyPilot as described above.
To run SGLang, complete the following steps:
-
Copy the following YAML file and name it
sglang.yaml
. Be sure to replace theaccelerators
field with your GPU type:Exampleenvs:HF_TOKEN: nullresources:image_id: docker:lmsysorg/sglang:latest# Replace with your GPU type.accelerators: H100_NVLINK_80GBports: 30000run: |conda deactivatepython3 -m sglang.launch_server \--model-path meta-llama/Llama-3.1-8B-Instruct \--host 0.0.0.0 \--port 30000 -
Run the following command:
Example$HF_TOKEN=<hugging_face_token> sky launch -c sglang sglang.yaml --env HF_TOKEN