Benchmarking with Warp - CoreWeave Docs

Warp is an open-source S3-compatible benchmarking tool that provides detailed per-request latency percentiles and throughput metrics. It is fully compatible with CoreWeave AI Object Storage. Warp supports a variety of benchmark types, including GET, PUT, DELETE, LIST, STAT, and mixed workloads.

Prerequisites

This guide assumes that you already have the following prerequisites in place:

A CKS cluster
A Node Pool with a Node to run the benchmarks on
kubectl installed and configured to access your cluster
A CoreWeave AI Object Storage access key and secret key
The necessary permissions to create an AI Object Storage bucket

Create a dedicated benchmark bucket

Warp completely wipes the benchmark bucket before and after each run. Use a dedicated bucket and never point Warp at a bucket containing production data.

Create a dedicated bucket for benchmarking using the Cloud Console. For the purposes of this guide, name the bucket warp-benchmark-bucket, and use the US-EAST-04A availability zone.

Alternatively, you can create a dedicated bucket using S3-compatible clients. For example, you can create a bucket named warp-benchmark-bucket in the US-EAST-04A availability zone using the following command, which works only if you’ve already created a CoreWeave-specific configuration:

aws s3api create-bucket \
  --bucket warp-benchmark-bucket \
  --region US-EAST-04A \
  --create-bucket-configuration LocationConstraint=US-EAST-04A

If you use a different bucket name, make sure to use a name that is easy to identify as a benchmark bucket, and prepend it with warp-. The rest of this guide will use the following bucket configuration values:

Bucket name: warp-benchmark-bucket
Availability zone: US-EAST-04A

Deploy Warp on a Node

Warp must run from inside your CKS cluster to produce meaningful results. Benchmarking from an external network measures network latency rather than storage performance, and the LOTA endpoint (http://cwlota.com) is only accessible from a CoreWeave cluster. Deploy Warp as a Pod using the official Warp container image. Warp does not use GPU resources, so no GPU resource request or node affinity is needed. LOTA is available on all Nodes in the cluster. For details on targeting specific Node types, see Target specific GPUs or CPUs.

warp-benchmark.yaml

apiVersion: v1
kind: Pod
metadata:
  name: warp-benchmark
spec:
  containers:
  - name: warp
    image: quay.io/minio/aistor/warp:latest
    command: ["sleep", "infinity"]

Apply the manifest:
```
kubectl apply -f warp-benchmark.yaml
```
Wait for the Pod to be running:
```
kubectl get pod warp-benchmark -w
```
Open a shell inside the Pod:
```
kubectl exec -it warp-benchmark -- sh
```
Verify that Warp is available:
```
/warp version
```

Set environment variables

In your Pod, set your CoreWeave AI Object Storage credentials, endpoint, and region using environment variables for convenience:

Set environment variables

export WARP_HOST="cwlota.com"
export WARP_ACCESS_KEY="[ACCESS-KEY-ID]"
export WARP_SECRET_KEY="[SECRET-ACCESS-KEY]"
export WARP_REGION="US-EAST-04A"

Measure read throughput with GET

Warp completely wipes the benchmark bucket before and after each run. Use a dedicated bucket (such as warp-benchmark-bucket) and never point Warp at a bucket containing production data.

Read throughput is the most relevant benchmark for data loading. The following configuration runs a get benchmark against the LOTA endpoint with recommended starting parameters:

warp-test-lota-get.yaml

warp:
  api: v1
  benchmark: get
  remote:
    bucket: warp-benchmark-bucket
    region: US-EAST-04A
    access-key: [YOUR_ACCESS_KEY_ID]
    secret-key: [YOUR_SECRET_ACCESS_KEY]
    host:
      - cwlota.com
    tls: false
    lookup: host
  params:
    duration: 5m
    concurrent: 300
    obj:
      size: 15MiB
    autoterm:
      enabled: true

Create and run this configuration in your Pod:

Create the benchmark configuration file:

Create GET benchmark configuration file

cat <<EOF > warp-test-lota-get.yaml
warp:
  api: v1
  benchmark: get
  remote:
    bucket: warp-benchmark-bucket
    region: $WARP_REGION
    access-key: $WARP_ACCESS_KEY
    secret-key: $WARP_SECRET_KEY
    host:
      - cwlota.com
    tls: false
    lookup: host
  params:
    duration: 5m
    concurrent: 300
    obj:
      size: 15MiB
    autoterm:
      enabled: true
EOF

Run the benchmark using the following command in your Pod:
Run GET benchmark
```
/warp run warp-test-lota-get.yaml
```

Recommended benchmark parameters

The following are the recommended benchmark parameters for read throughput. You can adjust them as needed to optimize performance for your workload.

Parameter	Value	Description
Concurrency Level	300	The number of parallel download operations to run. Start with 300 and adjust upward or downward. If you see throughput still climbing at 300, try increasing to 500 or more. If you see declining or unexpectedly low throughput, try decreasing the concurrency.
Object Size	15 MiB	The size of the objects to use. Start with 15 MiB to stay above the threshold where metadata overhead becomes significant.
Number of Objects	2,500	The number of objects to use. Start with 2,500 to provide a large enough pool for random selection. For most workloads, this is a good starting point.
Duration	5 minutes	The duration of the benchmark. Start with 5 minutes to capture stable, representative throughput numbers. For most workloads, this is a good starting point.
Range Size	1 MB	The size of the range to read. Start with 1 MB to avoid small-read overhead. Note that small reads are still served from the cache as long as the object size is greater than 4 MB. Objects smaller than 4 MB are not cached.
Autoterm	true	Automatically stop the benchmark when throughput stabilizes, preventing noisy results from incomplete warm-up periods.

Measure write throughput with PUT

The following configuration runs a put benchmark with recommended starting parameters:

warp-test-lota-put.yaml

warp:
  api: v1
  benchmark: put
  remote:
    bucket: warp-benchmark-bucket
    region: US-EAST-04A
    access-key: [YOUR_ACCESS_KEY_ID]
    secret-key: [YOUR_SECRET_ACCESS_KEY]
    host:
      - cwlota.com
    tls: false
    lookup: host
  params:
    duration: 5m
    concurrent: 300
    obj:
      size: 15MiB
    autoterm:
      enabled: true

Create and run this configuration in your Pod:

Create the benchmark configuration file:

Create PUT benchmark configuration file

cat <<EOF > warp-test-lota-put.yaml
warp:
  api: v1
  benchmark: put
  remote:
    bucket: warp-benchmark-bucket
    region: $WARP_REGION
    access-key: $WARP_ACCESS_KEY
    secret-key: $WARP_SECRET_KEY
    host:
      - cwlota.com
    tls: false
    lookup: host
  params:
    duration: 5m
    concurrent: 300
    obj:
      size: 15MiB
    autoterm:
      enabled: true
EOF

Run the benchmark using the following command in your Pod:
Run PUT benchmark
```
/warp run warp-test-lota-put.yaml
```

Measure mixed workload throughput

The mixed command simulates a realistic workload with a configurable mix of GET, PUT, STAT, and DELETE operations. The following configuration runs a mixed benchmark with the default operation distribution:

warp-test-lota-mixed.yaml

warp:
  api: v1
  benchmark: mixed
  remote:
    bucket: warp-benchmark-bucket
    region: US-EAST-04A
    access-key: [YOUR_ACCESS_KEY_ID]
    secret-key: [YOUR_SECRET_ACCESS_KEY]
    host:
      - cwlota.com
    tls: false
    lookup: host
  params:
    duration: 5m
    concurrent: 300
    objects: 2500
    obj:
      size: 15MiB
    autoterm:
      enabled: true

Create and run this configuration in your Pod:

Create the benchmark configuration file:

Create mixed workload benchmark configuration file

cat <<EOF > warp-test-lota-mixed.yaml
warp:
  api: v1
  benchmark: mixed
  remote:
    bucket: warp-benchmark-bucket
    region: $WARP_REGION
    access-key: $WARP_ACCESS_KEY
    secret-key: $WARP_SECRET_KEY
    host:
      - cwlota.com
    tls: false
    lookup: host
  params:
    duration: 5m
    concurrent: 300
    objects: 2500
    obj:
      size: 15MiB
    autoterm:
      enabled: true
EOF

Run the benchmark using the following command in your Pod:
Run mixed workload benchmark
```
/warp run warp-test-lota-mixed.yaml
```

Configure the operation distribution

By default, the mixed benchmark uses the following operation distribution: 45% GET, 15% PUT, 30% STAT, 10% DELETE. To customize the distribution, set the distribution key in your YAML configuration file. The following partial example shows where to add it:

Mixed workload benchmark configuration

warp:
  api: v1
  # ...
  params:
    duration: 5m
    concurrent: 300
    objects: 2500
    obj:
      size: 15MiB
    autoterm:
      enabled: true
    distribution:
      get: 45.0
      stat: 30.0
      put: 15.0
      delete: 10.0 # Must be same or lower than 'put'.

See the example Mixed Workload benchmark configuration file for a complete example.

Measure multipart upload throughput

The following configuration runs a multipart-put benchmark with recommended starting parameters:

warp-test-lota-multipart-put.yaml

warp:
  api: v1
  benchmark: multipart-put
  remote:
    bucket: warp-benchmark-bucket
    region: US-EAST-04A
    access-key: [YOUR_ACCESS_KEY_ID]
    secret-key: [YOUR_SECRET_ACCESS_KEY]
    host:
      - cwlota.com
    tls: false
    lookup: host
  params:
    duration: 5m
    concurrent: 10
    obj:
      size: 15MiB
      parts: 100
      part-size: 50MiB
      part-concurrent: 20
    autoterm:
      enabled: true

Create and run this configuration in your Pod:

Create the benchmark configuration file:

Create multipart upload benchmark configuration file

cat <<EOF > warp-test-lota-multipart-put.yaml
warp:
  api: v1
  benchmark: multipart-put
  remote:
    bucket: warp-benchmark-bucket
    region: $WARP_REGION
    access-key: $WARP_ACCESS_KEY
    secret-key: $WARP_SECRET_KEY
    host:
      - cwlota.com
    tls: false
    lookup: host
  params:
    duration: 5m
    concurrent: 10
    obj:
      size: 15MiB
      parts: 100
      part-size: 50MiB
      part-concurrent: 20
    autoterm:
      enabled: true
EOF

Run the benchmark using the following command in your Pod:
Run multipart upload benchmark
```
/warp run warp-test-lota-multipart-put.yaml
```

Interpret Warp output

Warp reports include several important metrics:

Warp output

Operation: GET (25000). Ran 2m0s. Size: 15 MiB. Concurrency: 300.
 * Average: 4.2 GiB/s, 286.72 obj/s

Throughput, split into 120 x 1s:
 * Fastest: 4.8 GiB/s, 327.68 obj/s
 * 50% Median: 4.2 GiB/s, 286.72 obj/s
 * Slowest: 3.1 GiB/s, 211.89 obj/s

Requests:
 * Avg: 42ms, 50%: 38ms, 90%: 67ms, 99%: 125ms
 * Fastest: 8ms, Slowest: 312ms, StdDev: 24ms

For each run, focus on these values when evaluating performance:

Average and median throughput (GiB/s): Your sustained read or write rate. The median is more representative than the average when there are outliers.
p50 / p90 / p99 request latency: These percentiles tell you how consistent performance is. A large gap between p50 and p99 may indicate contention or cache misses.
Slowest 1-second window: Represents worst-case throughput. If this is significantly lower than the median, investigate potential sources of variability.

Storage and network performance can vary. Run each benchmark configuration at least three times and report the median of the results to account for variability.

Compare LOTA and primary endpoint performance

Run the same benchmark against both cwlota.com and cwobject.com to quantify the performance benefit of LOTA caching. When benchmarking against cwobject.com, set tls: true in the remote section of your YAML configuration because the primary endpoint requires HTTPS. The LOTA endpoint uses HTTP from within the cluster, so tls: false is correct for cwlota.com. On the first run against LOTA, results will reflect cache-miss performance. Run the benchmark a second time to see fully cached performance. The following example configuration shows a GET benchmark for cwobject.com. You can modify the other benchmark configurations similarly to run against the primary endpoint:

warp-test-cwobject-get.yaml

warp:
  api: v1
  benchmark: get
  remote:
    bucket: warp-benchmark-bucket
    region: US-EAST-04A
    access-key: [YOUR_ACCESS_KEY_ID]
    secret-key: [YOUR_SECRET_ACCESS_KEY]
    host:
      - cwobject.com
    tls: true
    lookup: host
  params:
    duration: 5m
    concurrent: 300
    obj:
      size: 15MiB
    autoterm:
      enabled: true

Run distributed benchmarks

Warp supports distributed benchmarking, where multiple Warp client instances run in parallel across different nodes and a coordinator aggregates the results. This is how you scale beyond a single node to test higher concurrency levels. The concurrent value applies per client, so three clients with concurrent: 300 produce 900 total concurrent operations. Running distributed Warp on Kubernetes requires deploying multiple Warp Pods with network connectivity between them. Warp provides a Helm chart and Kubernetes manifests that handle this using a StatefulSet for the client Pods and a Job for the coordinator. To run distributed benchmarks on CKS, adapt the upstream Warp Kubernetes manifests with the following changes:

Node scheduling: Add node affinity rules and any required tolerations to schedule Warp Pods on Nodes appropriate for your workload.
Endpoint: Set the S3 host to cwlota.com instead of a MinIO server address.
Credentials: Use your CoreWeave AI Object Storage access key and secret key.

Cleanup

When you are done benchmarking, delete the Pod:

kubectl delete pod warp-benchmark

Contact support

For questions or assistance with large-scale dataset caching, please contact CoreWeave support.

Documentation Index

​Prerequisites

​Create a dedicated benchmark bucket

​Deploy Warp on a Node

​Set environment variables

​Measure read throughput with GET

​Recommended benchmark parameters

​Measure write throughput with PUT

​Measure mixed workload throughput

​Configure the operation distribution

​Measure multipart upload throughput

​Interpret Warp output

​Compare LOTA and primary endpoint performance

​Run distributed benchmarks

​Cleanup

​Contact support

Prerequisites

Create a dedicated benchmark bucket

Deploy Warp on a Node

Set environment variables

Measure read throughput with GET

Recommended benchmark parameters

Measure write throughput with PUT

Measure mixed workload throughput

Configure the operation distribution

Measure multipart upload throughput

Interpret Warp output

Compare LOTA and primary endpoint performance

Run distributed benchmarks

Cleanup

Contact support