Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

Warp is an open-source S3-compatible benchmarking tool that provides detailed per-request latency percentiles and throughput metrics. It is fully compatible with CoreWeave AI Object Storage. Warp supports a variety of benchmark types, including GET, PUT, DELETE, LIST, STAT, and mixed workloads.

Prerequisites

This guide assumes that you already have the following prerequisites in place:

Create a dedicated benchmark bucket

Warp completely wipes the benchmark bucket before and after each run. Use a dedicated bucket and never point Warp at a bucket containing production data.
  1. Create a dedicated bucket for benchmarking using the Cloud Console. For the purposes of this guide, name the bucket warp-benchmark-bucket, and use the US-EAST-04A availability zone.
Alternatively, you can create a dedicated bucket using S3-compatible clients. For example, you can create a bucket named warp-benchmark-bucket in the US-EAST-04A availability zone using the following command, which works only if you’ve already created a CoreWeave-specific configuration:
aws s3api create-bucket \
  --bucket warp-benchmark-bucket \
  --region US-EAST-04A \
  --create-bucket-configuration LocationConstraint=US-EAST-04A
If you use a different bucket name, make sure to use a name that is easy to identify as a benchmark bucket, and prepend it with warp-. The rest of this guide will use the following bucket configuration values:
  • Bucket name: warp-benchmark-bucket
  • Availability zone: US-EAST-04A

Deploy Warp on a Node

Warp must run from inside your CKS cluster to produce meaningful results. Benchmarking from an external network measures network latency rather than storage performance, and the LOTA endpoint (http://cwlota.com) is only accessible from a CoreWeave cluster. Deploy Warp as a Pod using the official Warp container image. Warp does not use GPU resources, so no GPU resource request or node affinity is needed. LOTA is available on all Nodes in the cluster. For details on targeting specific Node types, see Target specific GPUs or CPUs.
warp-benchmark.yaml
apiVersion: v1
kind: Pod
metadata:
  name: warp-benchmark
spec:
  containers:
  - name: warp
    image: quay.io/minio/aistor/warp:latest
    command: ["sleep", "infinity"]
  1. Apply the manifest:
    kubectl apply -f warp-benchmark.yaml
    
  2. Wait for the Pod to be running:
    kubectl get pod warp-benchmark -w
    
  3. Open a shell inside the Pod:
    kubectl exec -it warp-benchmark -- sh
    
  4. Verify that Warp is available:
    /warp version
    

Set environment variables

In your Pod, set your CoreWeave AI Object Storage credentials, endpoint, and region using environment variables for convenience:
Set environment variables
export WARP_HOST="cwlota.com"
export WARP_ACCESS_KEY="[ACCESS-KEY-ID]"
export WARP_SECRET_KEY="[SECRET-ACCESS-KEY]"
export WARP_REGION="US-EAST-04A"

Measure read throughput with GET

Warp completely wipes the benchmark bucket before and after each run. Use a dedicated bucket (such as warp-benchmark-bucket) and never point Warp at a bucket containing production data.
Read throughput is the most relevant benchmark for data loading. The following configuration runs a get benchmark against the LOTA endpoint with recommended starting parameters:
warp-test-lota-get.yaml
warp:
  api: v1
  benchmark: get
  remote:
    bucket: warp-benchmark-bucket
    region: US-EAST-04A
    access-key: [YOUR_ACCESS_KEY_ID]
    secret-key: [YOUR_SECRET_ACCESS_KEY]
    host:
      - cwlota.com
    tls: false
    lookup: host
  params:
    duration: 5m
    concurrent: 300
    obj:
      size: 15MiB
    autoterm:
      enabled: true
Create and run this configuration in your Pod:
  1. Create the benchmark configuration file:
    Create GET benchmark configuration file
    cat <<EOF > warp-test-lota-get.yaml
    warp:
      api: v1
      benchmark: get
      remote:
        bucket: warp-benchmark-bucket
        region: $WARP_REGION
        access-key: $WARP_ACCESS_KEY
        secret-key: $WARP_SECRET_KEY
        host:
          - cwlota.com
        tls: false
        lookup: host
      params:
        duration: 5m
        concurrent: 300
        obj:
          size: 15MiB
        autoterm:
          enabled: true
    EOF
    
  2. Run the benchmark using the following command in your Pod:
    Run GET benchmark
    /warp run warp-test-lota-get.yaml
    
The following are the recommended benchmark parameters for read throughput. You can adjust them as needed to optimize performance for your workload.
ParameterValueDescription
Concurrency Level300The number of parallel download operations to run. Start with 300 and adjust upward or downward. If you see throughput still climbing at 300, try increasing to 500 or more. If you see declining or unexpectedly low throughput, try decreasing the concurrency.
Object Size15 MiBThe size of the objects to use. Start with 15 MiB to stay above the threshold where metadata overhead becomes significant.
Number of Objects2,500The number of objects to use. Start with 2,500 to provide a large enough pool for random selection. For most workloads, this is a good starting point.
Duration5 minutesThe duration of the benchmark. Start with 5 minutes to capture stable, representative throughput numbers. For most workloads, this is a good starting point.
Range Size1 MBThe size of the range to read. Start with 1 MB to avoid small-read overhead. Note that small reads are still served from the cache as long as the object size is greater than 4 MB. Objects smaller than 4 MB are not cached.
AutotermtrueAutomatically stop the benchmark when throughput stabilizes, preventing noisy results from incomplete warm-up periods.

Measure write throughput with PUT

The following configuration runs a put benchmark with recommended starting parameters:
warp-test-lota-put.yaml
warp:
  api: v1
  benchmark: put
  remote:
    bucket: warp-benchmark-bucket
    region: US-EAST-04A
    access-key: [YOUR_ACCESS_KEY_ID]
    secret-key: [YOUR_SECRET_ACCESS_KEY]
    host:
      - cwlota.com
    tls: false
    lookup: host
  params:
    duration: 5m
    concurrent: 300
    obj:
      size: 15MiB
    autoterm:
      enabled: true
Create and run this configuration in your Pod:
  1. Create the benchmark configuration file:
    Create PUT benchmark configuration file
    cat <<EOF > warp-test-lota-put.yaml
    warp:
      api: v1
      benchmark: put
      remote:
        bucket: warp-benchmark-bucket
        region: $WARP_REGION
        access-key: $WARP_ACCESS_KEY
        secret-key: $WARP_SECRET_KEY
        host:
          - cwlota.com
        tls: false
        lookup: host
      params:
        duration: 5m
        concurrent: 300
        obj:
          size: 15MiB
        autoterm:
          enabled: true
    EOF
    
  2. Run the benchmark using the following command in your Pod:
    Run PUT benchmark
    /warp run warp-test-lota-put.yaml
    

Measure mixed workload throughput

The mixed command simulates a realistic workload with a configurable mix of GET, PUT, STAT, and DELETE operations. The following configuration runs a mixed benchmark with the default operation distribution:
warp-test-lota-mixed.yaml
warp:
  api: v1
  benchmark: mixed
  remote:
    bucket: warp-benchmark-bucket
    region: US-EAST-04A
    access-key: [YOUR_ACCESS_KEY_ID]
    secret-key: [YOUR_SECRET_ACCESS_KEY]
    host:
      - cwlota.com
    tls: false
    lookup: host
  params:
    duration: 5m
    concurrent: 300
    objects: 2500
    obj:
      size: 15MiB
    autoterm:
      enabled: true
Create and run this configuration in your Pod:
  1. Create the benchmark configuration file:
    Create mixed workload benchmark configuration file
    cat <<EOF > warp-test-lota-mixed.yaml
    warp:
      api: v1
      benchmark: mixed
      remote:
        bucket: warp-benchmark-bucket
        region: $WARP_REGION
        access-key: $WARP_ACCESS_KEY
        secret-key: $WARP_SECRET_KEY
        host:
          - cwlota.com
        tls: false
        lookup: host
      params:
        duration: 5m
        concurrent: 300
        objects: 2500
        obj:
          size: 15MiB
        autoterm:
          enabled: true
    EOF
    
  2. Run the benchmark using the following command in your Pod:
    Run mixed workload benchmark
    /warp run warp-test-lota-mixed.yaml
    

Configure the operation distribution

By default, the mixed benchmark uses the following operation distribution: 45% GET, 15% PUT, 30% STAT, 10% DELETE. To customize the distribution, set the distribution key in your YAML configuration file. The following partial example shows where to add it:
Mixed workload benchmark configuration
warp:
  api: v1
  # ...
  params:
    duration: 5m
    concurrent: 300
    objects: 2500
    obj:
      size: 15MiB
    autoterm:
      enabled: true
    distribution:
      get: 45.0
      stat: 30.0
      put: 15.0
      delete: 10.0 # Must be same or lower than 'put'.
See the example Mixed Workload benchmark configuration file for a complete example.

Measure multipart upload throughput

The following configuration runs a multipart-put benchmark with recommended starting parameters:
warp-test-lota-multipart-put.yaml
warp:
  api: v1
  benchmark: multipart-put
  remote:
    bucket: warp-benchmark-bucket
    region: US-EAST-04A
    access-key: [YOUR_ACCESS_KEY_ID]
    secret-key: [YOUR_SECRET_ACCESS_KEY]
    host:
      - cwlota.com
    tls: false
    lookup: host
  params:
    duration: 5m
    concurrent: 10
    obj:
      size: 15MiB
      parts: 100
      part-size: 50MiB
      part-concurrent: 20
    autoterm:
      enabled: true
Create and run this configuration in your Pod:
  1. Create the benchmark configuration file:
    Create multipart upload benchmark configuration file
    cat <<EOF > warp-test-lota-multipart-put.yaml
    warp:
      api: v1
      benchmark: multipart-put
      remote:
        bucket: warp-benchmark-bucket
        region: $WARP_REGION
        access-key: $WARP_ACCESS_KEY
        secret-key: $WARP_SECRET_KEY
        host:
          - cwlota.com
        tls: false
        lookup: host
      params:
        duration: 5m
        concurrent: 10
        obj:
          size: 15MiB
          parts: 100
          part-size: 50MiB
          part-concurrent: 20
        autoterm:
          enabled: true
    EOF
    
  2. Run the benchmark using the following command in your Pod:
    Run multipart upload benchmark
    /warp run warp-test-lota-multipart-put.yaml
    

Interpret Warp output

Warp reports include several important metrics:
Warp output
Operation: GET (25000). Ran 2m0s. Size: 15 MiB. Concurrency: 300.
 * Average: 4.2 GiB/s, 286.72 obj/s

Throughput, split into 120 x 1s:
 * Fastest: 4.8 GiB/s, 327.68 obj/s
 * 50% Median: 4.2 GiB/s, 286.72 obj/s
 * Slowest: 3.1 GiB/s, 211.89 obj/s

Requests:
 * Avg: 42ms, 50%: 38ms, 90%: 67ms, 99%: 125ms
 * Fastest: 8ms, Slowest: 312ms, StdDev: 24ms
For each run, focus on these values when evaluating performance:
  • Average and median throughput (GiB/s): Your sustained read or write rate. The median is more representative than the average when there are outliers.
  • p50 / p90 / p99 request latency: These percentiles tell you how consistent performance is. A large gap between p50 and p99 may indicate contention or cache misses.
  • Slowest 1-second window: Represents worst-case throughput. If this is significantly lower than the median, investigate potential sources of variability.
Storage and network performance can vary. Run each benchmark configuration at least three times and report the median of the results to account for variability.

Compare LOTA and primary endpoint performance

Run the same benchmark against both cwlota.com and cwobject.com to quantify the performance benefit of LOTA caching. When benchmarking against cwobject.com, set tls: true in the remote section of your YAML configuration because the primary endpoint requires HTTPS. The LOTA endpoint uses HTTP from within the cluster, so tls: false is correct for cwlota.com. On the first run against LOTA, results will reflect cache-miss performance. Run the benchmark a second time to see fully cached performance. The following example configuration shows a GET benchmark for cwobject.com. You can modify the other benchmark configurations similarly to run against the primary endpoint:
warp-test-cwobject-get.yaml
warp:
  api: v1
  benchmark: get
  remote:
    bucket: warp-benchmark-bucket
    region: US-EAST-04A
    access-key: [YOUR_ACCESS_KEY_ID]
    secret-key: [YOUR_SECRET_ACCESS_KEY]
    host:
      - cwobject.com
    tls: true
    lookup: host
  params:
    duration: 5m
    concurrent: 300
    obj:
      size: 15MiB
    autoterm:
      enabled: true

Run distributed benchmarks

Warp supports distributed benchmarking, where multiple Warp client instances run in parallel across different nodes and a coordinator aggregates the results. This is how you scale beyond a single node to test higher concurrency levels. The concurrent value applies per client, so three clients with concurrent: 300 produce 900 total concurrent operations. Running distributed Warp on Kubernetes requires deploying multiple Warp Pods with network connectivity between them. Warp provides a Helm chart and Kubernetes manifests that handle this using a StatefulSet for the client Pods and a Job for the coordinator. To run distributed benchmarks on CKS, adapt the upstream Warp Kubernetes manifests with the following changes:
  • Node scheduling: Add node affinity rules and any required tolerations to schedule Warp Pods on Nodes appropriate for your workload.
  • Endpoint: Set the S3 host to cwlota.com instead of a MinIO server address.
  • Credentials: Use your CoreWeave AI Object Storage access key and secret key.

Cleanup

When you are done benchmarking, delete the Pod:
kubectl delete pod warp-benchmark

Contact support

For questions or assistance with large-scale dataset caching, please contact CoreWeave support.
Last modified on April 30, 2026