Using SGLang on CKS is easiest with SkyPilot, which provides a way to launch and manage distributed applications on Kubernetes clusters. SGLang is a serving framework for large language models (LLMs), and pairing it with SkyPilot lets you launch LLM inference workloads on CKS without managing Kubernetes manifests directly.
This guide is for CKS users who want to serve an LLM for inference on their cluster. It shows you how to set up SGLang with SkyPilot on CKS by covering the following:
- Install and run SkyPilot
- Run SGLang
By the end of this guide, you have a working SGLang inference server running on your CKS cluster.
Prerequisites
Before you complete the steps in this guide, be sure your development environment meets the following requirements:
-
You have your
kubeconfig file properly set and can interact with clusters and Pods using kubectl.
-
An environment variable set to
HF_TOKEN with your Hugging Face token. For more information, see the Hugging Face instructions at User access tokens.
-
Authentication to use the
meta-llama/Llama-3.1-8B-Instruct model. For more information, go to meta-llama/Llama-3.1-8B-Instruct and request access. Approval for restricted models can take a few hours or longer.
-
The
socat and netcat networking utilities.
Install SkyPilot
SkyPilot is the launcher you use to deploy SGLang to CKS, so install it before continuing. You can install SkyPilot using Anaconda or uv. The following instructions cover both techniques. Choose whichever matches your existing Python toolchain.
Install SkyPilot with Anaconda
Python development environment with Anaconda installed.
The following commands create a Conda environment named sky with Python 3.10 and install SkyPilot with Kubernetes support:
conda create -y -n sky python=3.10
conda activate sky
pip install "skypilot[kubernetes]"
- CKS requires SkyPilot version 0.10.1 or later.
- SkyPilot requires Python 3.7 to 3.13.
Once you see output stating that Kubernetes is enabled infra, you’ve successfully installed SkyPilot. For example, you might see output similar to the following:
Enabled infra
Kubernetes [compute]
Debugging steps: Follow any instructions for installing missing dependencies.
Install SkyPilot with uv
To install SkyPilot with uv, you might need to download and install uv first. Follow the instruction on uv GitHub repository.
After you install uv, the following commands create a virtual environment with Python 3.10 and install SkyPilot with Kubernetes support:
uv venv --seed --python 3.10
uv pip install "skypilot[kubernetes]"
You should see output similar to the following:
Enabled infra
Kubernetes [compute]
Run SkyPilot
With SkyPilot installed, verify that it can reach your CKS cluster before launching workloads. The sky check command verifies SkyPilot’s connectivity to your Kubernetes infrastructure:
Optional: Run a nccl test on Nodes with InfiniBand enabled
If you have Nodes with InfiniBand enabled, you can confirm that GPU-to-GPU networking is healthy across Nodes before running SGLang. Run a nccl test by completing the following steps:
-
Copy the following file and name it
nccl-network-tier.yaml. Replace accelerators with your Node type (which corresponds to the gpu.nvidia.com/class property):
name: nccl-network-tier
resources:
infra: k8s
# Replace accelerators with your Node type that has InfiniBand enabled.
accelerators: H100_NVLINK_80GB:8
image_id: ghcr.io/coreweave/nccl-tests:12.8.1-devel-ubuntu22.04-nccl2.26.2-1-0708d2e
network_tier: best # Automatically requests rdma/ib: 1 resource and sets env vars
num_nodes: 2
run: |
if [ "${SKYPILOT_NODE_RANK}" == "0" ]; then
echo "Head node"
# Total number of processes, NP should be the total number of GPUs in the cluster
NP=$(($SKYPILOT_NUM_GPUS_PER_NODE * $SKYPILOT_NUM_NODES))
# Append :${SKYPILOT_NUM_GPUS_PER_NODE} to each IP as slots
nodes=""
for ip in $SKYPILOT_NODE_IPS; do
nodes="${nodes}${ip}:${SKYPILOT_NUM_GPUS_PER_NODE},"
done
nodes=${nodes::-1}
echo "All nodes: ${nodes}"
mpirun \
--allow-run-as-root \
--tag-output \
-H $nodes \
-np $NP \
-N $SKYPILOT_NUM_GPUS_PER_NODE \
--bind-to none \
-x PATH \
-x LD_LIBRARY_PATH \
-x NCCL_DEBUG=INFO \
-x NCCL_SOCKET_IFNAME=eth0 \
-x NCCL_IB_HCA \
-x UCX_NET_DEVICES \
-x SHARP_COLL_ENABLE_PCI_RELAXED_ORDERING=1 \
-x NCCL_COLLNET_ENABLE=0 \
/opt/nccl-tests/build/all_reduce_perf \
-b 512M \
-e 8G \
-f 2 \
-g 1 \
-c 1 \
-w 5 \
-n 10
else
echo "Worker nodes"
fi
-
Launch the
nccl test cluster with SkyPilot:
sky launch -c nccl-test-sky nccl-network-tier.yaml
Run SGLang
With SkyPilot installed and verified, you can now launch SGLang on your cluster to serve an LLM. To run SGLang on CKS, you can use the following example script.
You need to complete the steps to install and run SkyPilot as described in the preceding sections.
The following example launches the meta-llama/Llama-3.1-8B-Instruct model behind an SGLang server on port 30000. To run SGLang, complete the following steps:
-
Copy the following YAML file and name it
sglang.yaml. Be sure to replace the accelerators field with your GPU type:
envs:
HF_TOKEN: null
resources:
image_id: docker:lmsysorg/sglang:latest
# Replace with your GPU type.
accelerators: H100_NVLINK_80GB
ports: 30000
run: |
conda deactivate
python3 -m sglang.launch_server \
--model-path meta-llama/Llama-3.1-8B-Instruct \
--host 0.0.0.0 \
--port 30000
-
Launch the SGLang server with SkyPilot, replacing
[HUGGING-FACE-TOKEN] with your Hugging Face token:
HF_TOKEN=[HUGGING-FACE-TOKEN] sky launch -c sglang sglang.yaml --env HF_TOKEN
Last modified on June 10, 2026