Warp is an open-source S3-compatible benchmarking tool that provides detailed per-request latency percentiles and throughput metrics. It is fully compatible with CoreWeave AI Object Storage. Warp supports a variety of benchmark types, includingDocumentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
GET, PUT, DELETE, LIST, STAT, and mixed workloads.
Prerequisites
This guide assumes that you already have the following prerequisites in place:- A CKS cluster
- A Node Pool with a Node to run the benchmarks on
kubectlinstalled and configured to access your cluster- A CoreWeave AI Object Storage access key and secret key
- The necessary permissions to create an AI Object Storage bucket
Create a dedicated benchmark bucket
- Create a dedicated bucket for benchmarking using the Cloud Console. For the purposes of this guide, name the bucket
warp-benchmark-bucket, and use theUS-EAST-04Aavailability zone.
warp-benchmark-bucket in the US-EAST-04A availability zone using the following command, which works only if you’ve already created a CoreWeave-specific configuration:
warp-. The rest of this guide will use the following bucket configuration values:
- Bucket name:
warp-benchmark-bucket - Availability zone:
US-EAST-04A
Deploy Warp on a Node
Warp must run from inside your CKS cluster to produce meaningful results. Benchmarking from an external network measures network latency rather than storage performance, and the LOTA endpoint (http://cwlota.com) is only accessible from a CoreWeave cluster.
Deploy Warp as a Pod using the official Warp container image. Warp does not use GPU resources, so no GPU resource request or node affinity is needed. LOTA is available on all Nodes in the cluster. For details on targeting specific Node types, see Target specific GPUs or CPUs.
warp-benchmark.yaml
-
Apply the manifest:
-
Wait for the Pod to be running:
-
Open a shell inside the Pod:
-
Verify that Warp is available:
Set environment variables
In your Pod, set your CoreWeave AI Object Storage credentials, endpoint, and region using environment variables for convenience:Set environment variables
Measure read throughput with GET
Read throughput is the most relevant benchmark for data loading. The following configuration runs aget benchmark against the LOTA endpoint with recommended starting parameters:
warp-test-lota-get.yaml
-
Create the benchmark configuration file:
Create GET benchmark configuration file
-
Run the benchmark using the following command in your Pod:
Run GET benchmark
Recommended benchmark parameters
The following are the recommended benchmark parameters for read throughput. You can adjust them as needed to optimize performance for your workload.| Parameter | Value | Description |
|---|---|---|
| Concurrency Level | 300 | The number of parallel download operations to run. Start with 300 and adjust upward or downward. If you see throughput still climbing at 300, try increasing to 500 or more. If you see declining or unexpectedly low throughput, try decreasing the concurrency. |
| Object Size | 15 MiB | The size of the objects to use. Start with 15 MiB to stay above the threshold where metadata overhead becomes significant. |
| Number of Objects | 2,500 | The number of objects to use. Start with 2,500 to provide a large enough pool for random selection. For most workloads, this is a good starting point. |
| Duration | 5 minutes | The duration of the benchmark. Start with 5 minutes to capture stable, representative throughput numbers. For most workloads, this is a good starting point. |
| Range Size | 1 MB | The size of the range to read. Start with 1 MB to avoid small-read overhead. Note that small reads are still served from the cache as long as the object size is greater than 4 MB. Objects smaller than 4 MB are not cached. |
| Autoterm | true | Automatically stop the benchmark when throughput stabilizes, preventing noisy results from incomplete warm-up periods. |
Measure write throughput with PUT
The following configuration runs aput benchmark with recommended starting parameters:
warp-test-lota-put.yaml
-
Create the benchmark configuration file:
Create PUT benchmark configuration file
-
Run the benchmark using the following command in your Pod:
Run PUT benchmark
Measure mixed workload throughput
Themixed command simulates a realistic workload with a configurable mix of GET, PUT, STAT, and DELETE operations.
The following configuration runs a mixed benchmark with the default operation distribution:
warp-test-lota-mixed.yaml
-
Create the benchmark configuration file:
Create mixed workload benchmark configuration file
-
Run the benchmark using the following command in your Pod:
Run mixed workload benchmark
Configure the operation distribution
By default, themixed benchmark uses the following operation distribution: 45% GET, 15% PUT, 30% STAT, 10% DELETE. To customize the distribution, set the distribution key in your YAML configuration file. The following partial example shows where to add it:
Mixed workload benchmark configuration
Measure multipart upload throughput
The following configuration runs amultipart-put benchmark with recommended starting parameters:
warp-test-lota-multipart-put.yaml
-
Create the benchmark configuration file:
Create multipart upload benchmark configuration file
-
Run the benchmark using the following command in your Pod:
Run multipart upload benchmark
Interpret Warp output
Warp reports include several important metrics:Warp output
- Average and median throughput (GiB/s): Your sustained read or write rate. The median is more representative than the average when there are outliers.
- p50 / p90 / p99 request latency: These percentiles tell you how consistent performance is. A large gap between p50 and p99 may indicate contention or cache misses.
- Slowest 1-second window: Represents worst-case throughput. If this is significantly lower than the median, investigate potential sources of variability.
Compare LOTA and primary endpoint performance
Run the same benchmark against bothcwlota.com and cwobject.com to quantify the performance benefit of LOTA caching. When benchmarking against cwobject.com, set tls: true in the remote section of your YAML configuration because the primary endpoint requires HTTPS. The LOTA endpoint uses HTTP from within the cluster, so tls: false is correct for cwlota.com. On the first run against LOTA, results will reflect cache-miss performance. Run the benchmark a second time to see fully cached performance.
The following example configuration shows a GET benchmark for cwobject.com. You can modify the other benchmark configurations similarly to run against the primary endpoint:
warp-test-cwobject-get.yaml
Run distributed benchmarks
Warp supports distributed benchmarking, where multiple Warp client instances run in parallel across different nodes and a coordinator aggregates the results. This is how you scale beyond a single node to test higher concurrency levels. Theconcurrent value applies per client, so three clients with concurrent: 300 produce 900 total concurrent operations.
Running distributed Warp on Kubernetes requires deploying multiple Warp Pods with network connectivity between them. Warp provides a Helm chart and Kubernetes manifests that handle this using a StatefulSet for the client Pods and a Job for the coordinator.
To run distributed benchmarks on CKS, adapt the upstream Warp Kubernetes manifests with the following changes:
- Node scheduling: Add node affinity rules and any required tolerations to schedule Warp Pods on Nodes appropriate for your workload.
- Endpoint: Set the S3 host to
cwlota.cominstead of a MinIO server address. - Credentials: Use your CoreWeave AI Object Storage access key and secret key.