P2P distributed architectureSpegel uses a Kademlia-based Distributed Hash Table (DHT) to enable peer-to-peer image sharing across cluster Nodes. Each Node advertises its locally cached images so other Nodes can pull layers directly from cluster peers instead of external registries. This stateless design requires no persistent storage. Spegel uses containerd’s existing image cache on each Node.
- Verify containerd configuration to ensure registry mirroring is enabled.
- Deploy Spegel with Helm as a DaemonSet across all Nodes.
- Verify P2P image sharing by pulling images and observing peer-to-peer transfers.
What you'll need
Before you start, you must have a working CKS cluster with at least two CPU Nodes or GPU Nodes. P2P functionality requires multiple Nodes.You’ll need the following tools on your local machine:
- kubectl installed and configured for your cluster.
- kvaps node-shell to verify containerd configuration on your Nodes. Alternatively, use
kubectl execon those steps. - Helm version 3.8+.
What you'll use
You’ll use these tools and technologies:
- Spegel: Peer-to-peer OCI registry mirror for the cluster.
- Helm: Package manager that installs the Spegel chart from its OCI registry.
- kubectl: Kubernetes CLI for cluster access, verification, and optional port-forwarding to the Spegel debug UI.
- kubectl node-shell: Plugin to check containerd settings on Nodes (optional if you use another method).
Verify cluster access
Before you install Spegel, confirm that your local environment can reach the cluster and that the cluster has enough Nodes for peer-to-peer image sharing. Verify that you can access your cluster withkubectl:
Verify containerd configuration
Spegel requires specific containerd settings to work, as documented in the Spegel compatibility requirements. CoreWeave CKS clusters are pre-configured with these settings, but you can verify them with kubectl node-shell. Get a Node name and check the containerd configuration:CoreWeave CKS clusters are pre-configured with these settings, so you can proceed to deploying Spegel.
Deploy Spegel
With containerd verified, you can install Spegel as a DaemonSet so that every Node runs a local mirror. Install Spegel with the Helm chart from the OCI registry:- Creates the
spegelnamespace if it doesn’t exist. - Installs Spegel as a DaemonSet (one Pod per Node).
- Uses default configuration suitable for most clusters.
Deploy on SUNK GPU Nodes
To run Spegel on SUNK GPU Nodes, configure memory resources to use the Burstable QoS class instead of the upstream defaults where requests equal limits. Setting requests equal to limits creates a Guaranteed QoS Pod, which can interfere with Slurm’s thread counter on SUNK Nodes:values.yaml file:
values.yaml
Why Burstable QoS? Kubernetes assigns the Guaranteed QoS class when a Pod’s resource requests equal its limits. On SUNK Nodes, Guaranteed QoS Pods can interfere with Slurm’s thread counter, which tracks available CPU threads for job scheduling. When you set the memory limit higher than the request (256Mi versus 128Mi), the Pod receives the Burstable QoS class instead. This avoids the conflict while still providing resource constraints.
Tolerations: The upstream Spegel chart includes tolerations for all
NoExecute and NoSchedule taints by default. This covers the SUNK Node lock taint (sunk.coreweave.com/lock:NoExecute), so no additional toleration configuration is needed.Verify Spegel installation
Verify all Spegel Pods are running:Verify Spegel works
A running DaemonSet confirms that the Pods are healthy, but it doesn’t prove that P2P transfers are happening. The next steps exercise Spegel end-to-end so you can confirm that one Node serves image layers to another. Follow the official Spegel verification guide to confirm P2P image sharing is functioning. Pull an image on one Node, then pull the same image on a different Node. Verify that the second pull was served from the first Node through Spegel.Test P2P image distribution
The official Spegel documentation recommends running these commands to test P2P functionality. The commands select random Spegel Pods on different Nodes, create test Pods that pull an image, then clean up.The
shuf command is part of GNU coreutils. On macOS, install it with brew install coreutils.Verify with the debug page
Spegel includes a debug web interface that shows instance-level statistics. Port-forward to a Spegel Pod and check the debug page. If you did not run the preceding test commands, choose any Spegel Pod name first:http://localhost:9090/debug/web and check the Last Mirror Success field. If Spegel recently served an image from a peer, this field displays a duration (for example, 2m30s) indicating how long ago. On a freshly deployed cluster, this field shows Pending until the first P2P transfer occurs.
Test image pulls from the debug page
The debug page includes a Measure Image Pull feature at the bottom that lets you test P2P functionality directly. Enter an image reference (for example,docker.io/library/nginx:latest) and click Pull to see:
- Lookup Result: Shows discovered peers that have the image and lookup latency.
- Pull Result: Shows total pull duration, image size, and per-layer breakdown.
Optional: Benchmark P2P performance
Use this section when you need quantitative evidence that Spegel improves pull times in your cluster, for example, before standardizing on it across an environment. To measure Spegel’s performance, use the official Spegel benchmark tool. This tool measures image pull times and provides reproducible metrics that compare P2P performance against direct registry pulls. The steps in this section require Go and a workingkubectl context with permissions to create benchmark workloads in your cluster (see Prerequisites).
Install the benchmark tool
Install the benchmark tool with Go:benchmark binary to your $GOPATH/bin directory. Verify installation:
Run performance measurements
Create a results directory and run the benchmark with standardized test images:- Deploys DaemonSets that force image pulls across all Nodes.
- Measures initial pull times.
- Measures update pull times.
- Initial image pull times across Nodes.
- Rolling update pull times (which demonstrate P2P cache hits).
- Performance improvements when Spegel serves images from local peers.
Benchmark best practices: Run benchmarks on a cluster with at least 3 Nodes for meaningful P2P metrics. The benchmark tool uses standardized images with known sizes and layer counts to ensure reproducible results across different environments.
How Spegel works
Understanding Spegel’s architecture helps you troubleshoot and tune your deployment:- DaemonSet deployment: Spegel runs on every Node as a local registry (port 5000).
- Content advertisement: Each Node periodically re-advertises its cached image layers to the cluster DHT. For current defaults (refresh cadence and content time-to-live), see the Spegel architecture documentation.
- Registry mirroring: When containerd pulls an image, it checks Spegel first (20ms timeout).
- Peer discovery: Spegel uses the Kademlia DHT to find which Nodes have the requested layers.
- P2P transfer: If found locally, layers stream from peer Nodes. Otherwise, layers fall back to the external registry.
- Stateless operation: No persistent storage. Spegel uses containerd’s existing image cache.
Timeout tuning: The Increase this value in high-latency environments, or decrease it if you prioritize pull speed over P2P cache hits. See the Helm chart values.yaml for all configurable options.
mirrorResolveTimeout (default 20ms) controls how long containerd waits for Spegel before falling back to external registries. Configure this through Helm: