Skip to main content

LOTA (Local Object Transfer Accelerator)

Accelerates CoreWeave AI Object Storage

CoreWeave's Local Object Transfer Accelerator (LOTA) is an intelligent proxy installed on every GPU Node in a CKS cluster to accelerate data transfer. LOTA achieves this by providing a highly efficient, local gateway to CoreWeave AI Object Storage on each Node in the cluster for faster data transfer rates and decreased latency.

Overview

With LOTA, software clients can easily interact with CoreWeave AI Object Storage through a new API endpoint. Clients only need to point their requests to the LOTA endpoint instead of the primary endpoint, with no other changes required to S3-compatible clients.

Info
  • The primary endpoint is https://cwobject.com
  • The LOTA endpoint is http://cwlota.com

LOTA proxies all Object Storage requests to the Object Storage Gateway and storage backend. First, LOTA authenticates each request with the gateway and verifies proper authorization. Then, when possible, LOTA then bypasses the gateway and directly accesses the storage backend to fetch objects with the greatest possible throughput. LOTA stores the fetched objects in a distributed cache to significantly boost data transfer rates, especially for repeated data requests.

Data upload and retrieval

In Step 1 of the diagram below, training data is uploaded to the Object Storage Gateway via the LOTA endpoint for indexing. The Gateway then stores the data in the Object Repository. For data uploads, the LOTA endpoint is used the same way as the primary endpoint.

In Step 2, LOTA forwards the client's request to the Object Storage Gateway, including the local Node's location information. The Gateway processes the request and returns the data, which may include the direct path to the object. When LOTA uses the direct path to bypass the Gateway and access the object directly, data transfer rates improve significantly. By storing the data in a distributed cache, LOTA ensures that frequently accessed objects are readily available for quick retrieval.

Process flow

LOTA actively caches recently accessed objects on the local disks of GPU Nodes, significantly reducing latency and boosting read speeds for CoreWeave AI Object Storage. The following diagram illustrates the process flow when fetching an object using LOTA.

When a request is made to LOTA, it first checks if the object is available in the cache. If the object is found, it's fetched directly from the cache, ensuring minimal latency.

If the object is not in the cache, LOTA fetches it from the backend storage and forks it into two pathways:

  • Stream 1 sends the object to the client application.
  • Stream 2 stores the object in the cache, using local storage on one or more GPU Nodes.

This dual-pathway approach ensures that future requests for the same data are served quickly from the cache, enhancing overall performance. LOTA distributes the cache across all GPU Nodes in a CKS cluster, ensuring efficient data retrieval and management.

How LOTA manages the cache

When storing an object, LOTA computes which Node should hold the object in its local cache, always considering the object as a whole to avoid the network overhead of accessing multiple Nodes. This computation produces a list of one or more compatible Nodes. LOTA then determines the optimal cache Node placement based on distribution load optimization. See the Best Practices guide to learn how different upload patterns affect cache placement.

When a client application requests an object (or part of an object), LOTA again determines which Node in the distributed cache should hold the data, then makes an HTTP request to that Node. The Node checks its local cache, and if the data is found, passes it through a decryption layer before pushing it to the client.

If the data is not found, the Node fetches the entire object from the storage backend and forks it into two pathways. The first path sends the object (or the requested part) to the client application, while the second path stores the entire object in the local cache of the Node that fetched it. LOTA always caches the entire object when a client application requests any part of it, without requiring multiple GET actions to cache those parts of the object.

This distributed cache is managed by LOTA using the Least Recently Used (LRU) cache algorithm to ensure that recently accessed objects are available for quick retrieval. LOTA's read-after-write consistency guarantees that objects read immediately after writing are always up-to-date without requiring cache invalidation.

Learn more

Currently, LOTA only accelerates HTTP GET requests. CoreWeave plans to support other requests in the future.

Control LOTA cache behavior

By default, all GET requests made to the LOTA endpoint are cached. For testing or analysis, it may be desirable to disable the cache for specific requests. To do so, set the Cache-Control header to either no-store or no-cache when performing GET operations. These two options have different effects on LOTA's behavior:

  • no-cache: LOTA queries the cache for the object, but the GET request is not cached. This simulates the worst-case performance by forcing LOTA to proxy the data, but not cache it.
  • no-store: LOTA fetches the object directly from the storage backend, bypassing the cache, and the object is not cached. This allows you to test the performance of the storage backend without caching.

Learn more in the S3 compatibility reference guide.

More information

To learn how to configure popular S3-compatible tools to use LOTA, see How To: Manage Buckets.