Skip to main content

Best Practices for CoreWeave AI Object Storage

Use LOTA

To achieve the highest performance for CoreWeave AI Object Storage, always use LOTA when making requests within a CoreWeave cluster. Clients only need to point their requests to the LOTA endpoint instead of the primary endpoint to take advantage of LOTA's caching capabilities.

Use multipart uploads

Use multipart uploads for large objects. When uploading an object with s3:CreateMultiPartUpLoad, LOTA distributes the object's parts across the GPU Nodes in the cache. By placing the parts of the object across multiple GPU Nodes, read performance is significantly improved. LOTA automatically invalidates the cache when parts of the object are updated, so that no stale data is served.

Multipart uploads have several other inherent advantages, too:

  • Improved throughput: Upload the parts in parallel to make the best use of available bandwidth.
  • Fast recovery: Smaller parts minimize the impact of a network error.
  • Ability to pause and resume: Multipart uploads do not expire. It's possible to pause and resume at any time, because the upload must be explicitly completed or deliberately stopped.
  • Upload as objects are created: Multipart uploads can begin before the final object size is known.

Conversely, objects uploaded via s3:PutObject are stored on a single GPU Node, which can lead to performance bottlenecks with large objects.

Choose the right part size

The number of parts in a multipart upload has a significant impact on LOTA's cache performance.

When the number of parts is similar to the number of GPU Nodes in the cluster, LOTA can distribute those parts evenly across the cluster. This improves read performance and prevents "hot spotting" the cache with long I/O wait times. However, breaking small objects into too many parts can degrade read performance because of the overhead caused by excessive HTTP requests.

Use this information as a general guide. The optimum number of parts for a multipart upload depends on the size of the object and the number of GPU Nodes in the cluster.

Important

LOTA only caches objects greater than 4MB. Smaller objects are fetched directly from the backend, bypassing the cache.

Prepare the cache with a zero-byte Range request

LOTA fetches metadata for the entire object when it receives a Range request for 0 bytes. This prepares the cache, making subsequent requests for the object faster.

To prepare the cache, make a request for the object with Range: bytes=0-0 in the HTTP request header. LOTA fetches the entire metadata and returns only the first byte of the object. See Range in the MDN docs for more information.

Avoid sequential object keys

Do not use sequential object keys. When creating an object in a bucket, the name given to that object is the object key. Use a random prefix for any sequential keys to maintain high performance.