Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

Deploy Spegel, a stateless peer-to-peer OCI registry mirror, on CoreWeave Kubernetes Service (CKS). Speed up container image pulls, reduce external registry dependencies, and improve cluster reliability by sharing images across nodes.
P2P distributed architectureSpegel uses a Kademlia-based Distributed Hash Table (DHT) to enable peer-to-peer image sharing across cluster nodes. Each node advertises its locally cached images, allowing other nodes to pull layers directly from cluster peers instead of external registries. This stateless design requires no persistent storage, Spegel leverages containerd’s existing image cache on each node.
In this tutorial, you will:
  1. Verify containerd configuration to ensure registry mirroring is enabled
  2. Deploy Spegel using Helm as a DaemonSet across all nodes
  3. Verify P2P image sharing by pulling images and observing peer-to-peer transfers

What you'll need

Before you start, you must have:You’ll need the following tools on your local machine:

What you'll use

You’ll use these tools and technologies:
  • Spegel: Peer-to-peer OCI registry mirror for the cluster
  • Helm: Package manager used to install the Spegel chart from its OCI registry
  • kubectl: Kubernetes CLI for cluster access, verification, and optional port-forwarding to the Spegel debug UI
  • kubectl node-shell: Plugin for checking containerd settings on nodes (optional if you use another method)

Verify cluster access

Verify that you can access your cluster with kubectl:
kubectl cluster-info
You should see something similar to:
Kubernetes control plane is running at...
CoreDNS is running at...
node-local-dns is running at...
Verify your cluster has at least two nodes:
kubectl get nodes
You should see at least two nodes:
NAME      STATUS   ROLES    AGE   VERSION
g8fb8e0   Ready    <none>   76d   v1.34.3
g8fd342   Ready    <none>   76d   v1.34.3
g8ff980   Ready    <none>   76d   v1.34.3

Verify containerd configuration

Spegel requires specific containerd settings to function properly, as documented in the Spegel compatibility requirements. CoreWeave CKS clusters are pre-configured with these settings, but you can verify them using kubectl node-shell. Get a node name and check the containerd configuration:
# Get a node name
NODE_NAME=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')

# Shell into the node and check containerd config
kubectl node-shell ${NODE_NAME} -- grep -E 'config_path|discard_unpacked_layers' /etc/containerd/config.toml
Verify the output contains these required settings:
config_path = "/etc/containerd/certs.d"
discard_unpacked_layers = false
CoreWeave CKS clusters are pre-configured with these settings, so you can proceed to deploying Spegel.

Deploy Spegel

Install Spegel using the Helm chart from the OCI registry:
helm upgrade --create-namespace --namespace spegel --install spegel \
  oci://ghcr.io/spegel-org/helm-charts/spegel
This command:
  • Creates the spegel namespace if it doesn’t exist
  • Installs Spegel as a DaemonSet (one pod per node)
  • Uses default configuration optimized for most clusters

Deploy on SUNK GPU nodes

To run Spegel on SUNK GPU nodes, configure memory resources to use the Burstable QoS class instead of the upstream defaults where requests equal limits. Setting requests equal to limits creates a Guaranteed QoS pod, which can interfere with Slurm’s thread counter on SUNK nodes:
helm upgrade --create-namespace --namespace spegel --install spegel \
  oci://ghcr.io/spegel-org/helm-charts/spegel \
  --set resources.requests.memory=128Mi \
  --set resources.limits.memory=256Mi
Alternatively, create a values.yaml file:
values.yaml
resources:
  requests:
    memory: 128Mi
  limits:
    memory: 256Mi
Then install with:
helm upgrade --create-namespace --namespace spegel --install spegel \
  oci://ghcr.io/spegel-org/helm-charts/spegel \
  -f values.yaml
Why Burstable QoS? Kubernetes assigns the Guaranteed QoS class when a pod’s resource requests equal its limits. On SUNK nodes, Guaranteed QoS pods can interfere with Slurm’s thread counter, which tracks available CPU threads for job scheduling. By setting the memory limit higher than the request (256Mi vs 128Mi), the pod receives the Burstable QoS class instead, avoiding this conflict while still providing resource constraints.
Tolerations: The upstream Spegel chart includes tolerations for all NoExecute and NoSchedule taints by default. This covers the SUNK node lock taint (sunk.coreweave.com/lock:NoExecute), so no additional toleration configuration is needed.
The Spegel Helm chart automatically configures containerd registry mirrors on each node. No additional containerd configuration is required after installation. Verify the DaemonSet is running:
kubectl get daemonset -n spegel
You should see the DaemonSet with the desired number matching your node count:
NAME     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
spegel   3         3         3       3            3           kubernetes.io/os=linux   7s

Verify Spegel installation

Verify all Spegel pods are running:
kubectl get pods -n spegel -o wide
You should see one pod per node:
NAME           READY   STATUS    RESTARTS   AGE   IP           NODE      NOMINATED NODE   READINESS GATES
spegel-4sppw   1/1     Running   0          16s   10.0.0.112   g8fb8e0   <none>           <none>
spegel-h2fl2   1/1     Running   0          16s   10.0.1.126   g8fd342   <none>           <none>
spegel-qf5gw   1/1     Running   0          16s   10.0.0.246   g8ff980   <none>           <none>

Verify Spegel works

Follow the official Spegel verification guide to confirm P2P image sharing is functioning. This involves pulling an image on one node, then pulling the same image on a different node and verifying it was served from the first node via Spegel.

Test P2P image distribution

The official Spegel documentation recommends running these commands to test P2P functionality. The commands select random Spegel pods on different nodes, create test pods that pull an image, then clean up.
The shuf command is part of GNU coreutils. On macOS, install it with brew install coreutils.
# Select a random upstream node and pull an image there
UPSTREAM_POD_NAME=$(kubectl --namespace spegel -l app.kubernetes.io/name=spegel \
  get pods -o custom-columns=:metadata.name --no-headers | shuf -n 1)
UPSTREAM_NODE_NAME=$(kubectl --namespace spegel get pod ${UPSTREAM_POD_NAME} \
  -o jsonpath="{.spec.nodeName}")
kubectl --namespace default run upstream --image=ubuntu:25.04 --restart=Never \
  --overrides="{\"spec\":{\"nodeName\":\"${UPSTREAM_NODE_NAME}\",\"containers\":[{\"name\":\"ubuntu\",\"image\":\"ubuntu:25.04\",\"imagePullPolicy\":\"Always\",\"command\":[\"true\"]}]}}"

# Select a different node and pull the same image (should come from Spegel)
MIRROR_POD_NAME=$(kubectl --namespace spegel -l app.kubernetes.io/name=spegel \
  get pods -o custom-columns=:metadata.name --no-headers | grep -v "^${UPSTREAM_POD_NAME}$" | shuf -n 1)
MIRROR_NODE_NAME=$(kubectl --namespace spegel get pod ${MIRROR_POD_NAME} \
  -o jsonpath="{.spec.nodeName}")
kubectl --namespace default run mirror --image=ubuntu:25.04 --restart=Never \
  --overrides="{\"spec\":{\"nodeName\":\"${MIRROR_NODE_NAME}\",\"containers\":[{\"name\":\"ubuntu\",\"image\":\"ubuntu:25.04\",\"imagePullPolicy\":\"Always\",\"command\":[\"true\"]}]}}"

# Clean up test pods
kubectl --namespace default delete pod upstream mirror

Verify with the debug page

Spegel includes a debug web interface that shows instance-level statistics. Port-forward to a Spegel pod and check the debug page. If you did not run the test commands above, choose any Spegel pod name first:
kubectl --namespace spegel port-forward ${MIRROR_POD_NAME} 9090
In a browser, navigate to http://localhost:9090/debug/web and check the Last Mirror Success field. If Spegel has recently served an image from a peer, this field displays a duration (for example, 2m30s) indicating how long ago. On a freshly deployed cluster, this field shows Pending until the first P2P transfer occurs.

Test image pulls from the debug page

The debug page includes a Measure Image Pull feature at the bottom that lets you test P2P functionality directly. Enter an image reference (for example, docker.io/library/nginx:latest) and click Pull to see:
  • Lookup Result: Shows discovered peers that have the image and lookup latency
  • Pull Result: Shows total pull duration, image size, and per-layer breakdown
Silent fallback behavior: Spegel is designed to fall back silently to upstream registries when P2P transfer is unavailable. This means image pulls will succeed even if Spegel isn’t functioning, potentially masking configuration problems. Use the debug page to verify P2P transfers are occurring.

Benchmark P2P performance

For quantitative validation of Spegel’s performance, use the official Spegel benchmark tool. This tool measures image pull times and provides reproducible metrics comparing P2P performance against direct registry pulls. The steps in this section require Go and a working kubectl context with permissions to create benchmark workloads in your cluster (see Prerequisites).

Install the benchmark tool

Install the benchmark tool using Go:
go install github.com/spegel-org/benchmark@latest
This installs the benchmark binary to your $GOPATH/bin directory. Verify installation:
benchmark --help
With output similar to:
Usage: benchmark <command> [<args>]

Options:
  --help, -h             display this help and exit

Commands:
  generate               Generate images for benchmarking.
  measure                Run benchmark measurement.
  suite                  Run the full suite of measurements.
  analyze                Analyze benchmark results.

Run performance measurements

Create a results directory and run the benchmark with standardized test images:
# Create results directory
mkdir -p ~/spegel-benchmark-results

# Run benchmark with 10MB test images
benchmark measure \
  --output-dir ~/spegel-benchmark-results \
  --namespace spegel-benchmark \
  --images ghcr.io/spegel-org/benchmark:v1-10MB-1 \
    ghcr.io/spegel-org/benchmark:v2-10MB-1
The benchmark will:
  1. Deploy DaemonSets forcing image pulls across all nodes
  2. Measure initial pull times
  3. Measure update pull times
For more comprehensive testing, run benchmarks with different image sizes:
# Test 100MB images
benchmark measure \
  --output-dir ~/spegel-benchmark-results/100mb \
  --namespace spegel-benchmark \
  --images ghcr.io/spegel-org/benchmark:v1-100MB-1 \
    ghcr.io/spegel-org/benchmark:v2-100MB-1

# Test 1GB images
benchmark measure \
  --output-dir ~/spegel-benchmark-results/1gb \
  --namespace spegel-benchmark \
  --images ghcr.io/spegel-org/benchmark:v1-1GB-1 \
    ghcr.io/spegel-org/benchmark:v2-1GB-1
For more information, see the Spegel benchmark documentation. This creates performance charts comparing:
  • Initial image pull times across nodes
  • Rolling update pull times (demonstrating P2P cache hits)
  • Performance improvements when Spegel serves images from local peers
Benchmark best practices: Run benchmarks on a cluster with at least 3 nodes for meaningful P2P metrics. The benchmark tool uses standardized images with known sizes and layer counts to ensure reproducible results across different environments.

How Spegel works

Understanding Spegel’s architecture helps you troubleshoot and optimize your deployment:
  1. DaemonSet deployment: Spegel runs on every node as a local registry (port 5000)
  2. Content advertisement: Each node periodically re-advertises its cached image layers to the cluster DHT. For current defaults (refresh cadence and content time-to-live), see the Spegel architecture documentation.
  3. Registry mirroring: When containerd pulls an image, it checks Spegel first (20ms timeout)
  4. Peer discovery: Spegel uses the Kademlia DHT to find which nodes have the requested layers
  5. P2P transfer: If found locally, layers stream from peer nodes; otherwise fallback to external registry
  6. Stateless operation: No persistent storage, Spegel uses containerd’s existing image cache
Timeout tuning: The mirrorResolveTimeout (default 20ms) controls how long containerd waits for Spegel before falling back to external registries. Configure this via Helm:
helm upgrade --namespace spegel spegel oci://ghcr.io/spegel-org/helm-charts/spegel \
  --set mirrorResolveTimeout=50ms
Increase this value in high-latency environments or decrease it if you prioritize pull speed over P2P cache hits. See the Helm chart values.yaml for all configurable options.

Additional resources

Last modified on April 20, 2026