Deploy Spegel, a stateless peer-to-peer OCI registry mirror, on CoreWeave Kubernetes Service (CKS). Speed up container image pulls, reduce external registry dependencies, and improve cluster reliability by sharing images across nodes.Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
P2P distributed architectureSpegel uses a Kademlia-based Distributed Hash Table (DHT) to enable peer-to-peer image sharing across cluster nodes. Each node advertises its locally cached images, allowing other nodes to pull layers directly from cluster peers instead of external registries. This stateless design requires no persistent storage, Spegel leverages containerd’s existing image cache on each node.
- Verify containerd configuration to ensure registry mirroring is enabled
- Deploy Spegel using Helm as a DaemonSet across all nodes
- Verify P2P image sharing by pulling images and observing peer-to-peer transfers
What you'll need
Before you start, you must have:
- A working CKS cluster with at least two CPU Nodes or GPU Nodes (P2P functionality requires multiple nodes)
- kubectl installed and configured for your cluster
- kvaps node-shell for verifying containerd configuration on your nodes, alternatively use kubectl exec on those steps
- Helm version 3.8+
What you'll use
You’ll use these tools and technologies:
- Spegel: Peer-to-peer OCI registry mirror for the cluster
- Helm: Package manager used to install the Spegel chart from its OCI registry
- kubectl: Kubernetes CLI for cluster access, verification, and optional port-forwarding to the Spegel debug UI
- kubectl node-shell: Plugin for checking containerd settings on nodes (optional if you use another method)
Verify cluster access
Verify that you can access your cluster withkubectl:
Verify containerd configuration
Spegel requires specific containerd settings to function properly, as documented in the Spegel compatibility requirements. CoreWeave CKS clusters are pre-configured with these settings, but you can verify them using kubectl node-shell. Get a node name and check the containerd configuration:CoreWeave CKS clusters are pre-configured with these settings, so you can proceed to deploying Spegel.
Deploy Spegel
Install Spegel using the Helm chart from the OCI registry:- Creates the
spegelnamespace if it doesn’t exist - Installs Spegel as a DaemonSet (one pod per node)
- Uses default configuration optimized for most clusters
Deploy on SUNK GPU nodes
To run Spegel on SUNK GPU nodes, configure memory resources to use the Burstable QoS class instead of the upstream defaults where requests equal limits. Setting requests equal to limits creates a Guaranteed QoS pod, which can interfere with Slurm’s thread counter on SUNK nodes:values.yaml file:
values.yaml
Why Burstable QoS? Kubernetes assigns the Guaranteed QoS class when a pod’s resource requests equal its limits. On SUNK nodes, Guaranteed QoS pods can interfere with Slurm’s thread counter, which tracks available CPU threads for job scheduling. By setting the memory limit higher than the request (256Mi vs 128Mi), the pod receives the Burstable QoS class instead, avoiding this conflict while still providing resource constraints.
Tolerations: The upstream Spegel chart includes tolerations for all
NoExecute and NoSchedule taints by default. This covers the SUNK node lock taint (sunk.coreweave.com/lock:NoExecute), so no additional toleration configuration is needed.Verify Spegel installation
Verify all Spegel pods are running:Verify Spegel works
Follow the official Spegel verification guide to confirm P2P image sharing is functioning. This involves pulling an image on one node, then pulling the same image on a different node and verifying it was served from the first node via Spegel.Test P2P image distribution
The official Spegel documentation recommends running these commands to test P2P functionality. The commands select random Spegel pods on different nodes, create test pods that pull an image, then clean up.The
shuf command is part of GNU coreutils. On macOS, install it with brew install coreutils.Verify with the debug page
Spegel includes a debug web interface that shows instance-level statistics. Port-forward to a Spegel pod and check the debug page. If you did not run the test commands above, choose any Spegel pod name first:http://localhost:9090/debug/web and check the Last Mirror Success field. If Spegel has recently served an image from a peer, this field displays a duration (for example, 2m30s) indicating how long ago. On a freshly deployed cluster, this field shows Pending until the first P2P transfer occurs.
Test image pulls from the debug page
The debug page includes a Measure Image Pull feature at the bottom that lets you test P2P functionality directly. Enter an image reference (for example,docker.io/library/nginx:latest) and click Pull to see:
- Lookup Result: Shows discovered peers that have the image and lookup latency
- Pull Result: Shows total pull duration, image size, and per-layer breakdown
Benchmark P2P performance
For quantitative validation of Spegel’s performance, use the official Spegel benchmark tool. This tool measures image pull times and provides reproducible metrics comparing P2P performance against direct registry pulls. The steps in this section require Go and a workingkubectl context with permissions to create benchmark workloads in your cluster (see Prerequisites).
Install the benchmark tool
Install the benchmark tool using Go:benchmark binary to your $GOPATH/bin directory. Verify installation:
Run performance measurements
Create a results directory and run the benchmark with standardized test images:- Deploy DaemonSets forcing image pulls across all nodes
- Measure initial pull times
- Measure update pull times
- Initial image pull times across nodes
- Rolling update pull times (demonstrating P2P cache hits)
- Performance improvements when Spegel serves images from local peers
Benchmark best practices: Run benchmarks on a cluster with at least 3 nodes for meaningful P2P metrics. The benchmark tool uses standardized images with known sizes and layer counts to ensure reproducible results across different environments.
How Spegel works
Understanding Spegel’s architecture helps you troubleshoot and optimize your deployment:- DaemonSet deployment: Spegel runs on every node as a local registry (port 5000)
- Content advertisement: Each node periodically re-advertises its cached image layers to the cluster DHT. For current defaults (refresh cadence and content time-to-live), see the Spegel architecture documentation.
- Registry mirroring: When containerd pulls an image, it checks Spegel first (20ms timeout)
- Peer discovery: Spegel uses the Kademlia DHT to find which nodes have the requested layers
- P2P transfer: If found locally, layers stream from peer nodes; otherwise fallback to external registry
- Stateless operation: No persistent storage, Spegel uses containerd’s existing image cache
Timeout tuning: The Increase this value in high-latency environments or decrease it if you prioritize pull speed over P2P cache hits. See the Helm chart values.yaml for all configurable options.
mirrorResolveTimeout (default 20ms) controls how long containerd waits for Spegel before falling back to external registries. Configure this via Helm: