Skip to main content

Use the Boto3 SDK with CoreWeave Object Storage

The Boto3 SDK provides a simple, intuitive interface for performing common operations on S3-compatible object storage like creating buckets, uploading files, listing objects, and deleting files.

This guide is for Python developers who want to use CoreWeave Object Storage with the Boto3 SDK. It demonstrates how to set up your environment, initialize the Boto3 client, perform basic operations, generate pre-signed URLs, and use multi-part uploads for large files.

Boto3 is open-source, actively maintained by AWS, fully compatible with CoreWeave Object Storage, and a key library used by CoreWeave Tensorizer to interact with Object Storage. Tensorizer provides extremely fast model loads from HTTP/HTTPS and S3 endpoints.

Prerequisites

Before following the examples in this guide, you should have:

  • An Access Key and a Secret Key for CoreWeave Object Storage
  • Python version 3.8 or later installed on a macOS or Linux workstation
  • Basic Python development experience and familiarity with the command line

Configure the Boto3 environment

On your local workstation, create a project folder and initialize a virtual environment:

$ mkdir my-project
$ cd my-project
$ python3 -m venv venv
$ source venv/bin/activate

Install the boto3 SDK and the python-dotenv library to store your credentials and environment.

$ pip install boto3 python-dotenv

Create a .env file in the project directory to store your object storage credentials and configuration.

$ nano .env

Paste the following, replacing the values with the token credentials and endpoint URL from the Web UI's Object Storage section. Choose a unique bucket name, which must follow the same constraints as domain names. It must be less than 63 characters, be unique across all of CoreWeave Object Storage, begin and end with a lowercase letter, and consist of only lowercase letters, numbers, and hyphens.

ACCESS_KEY_ID=<your_access_key_id>
SECRET_ACCESS_KEY=<your_secret_access_key>
ENDPOINT_URL=<your_endpoint_url>
BUCKET_NAME=<your_unique_bucket_name>

Save the file and exit.

Demonstration

Create a dummy file to upload to Object Storage.

$ touch demo-file1.txt

Create an example Python program in the project directory.

$ nano demo-boto3.py

Paste the following code into the new file:

import boto3
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Initialize Boto3 S3 client
client = boto3.client(
's3',
aws_access_key_id=os.getenv("ACCESS_KEY_ID"),
aws_secret_access_key=os.getenv("SECRET_ACCESS_KEY"),
endpoint_url=os.getenv("ENDPOINT_URL"),
region_name="default"
)

# Create the bucket
response = client.create_bucket(Bucket=os.getenv("BUCKET_NAME"))
print('create_bucket response:')
print(response)

# Upload a file to the bucket
with open('demo-file1.txt', 'rb') as file:
response = client.put_object(Bucket=os.getenv("BUCKET_NAME"),
Key='demo-file1.txt',
Body=file,
ACL='private')
print('put_object response:')
print(response)

# List all files in the bucket
print('Files in bucket:')
for key in client.list_objects(Bucket=os.getenv("BUCKET_NAME"))['Contents']:
print(key['Key'])

Save the file, then run the demonstration program.

$ python demo-boto3.py

Example output

create_bucket response:
{'ResponseMetadata': {'RequestId': 'tx000005fb5aa22a0947fe3-0064eba496-2756d2362-default', 'HostId': '', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-request-id': 'tx000005fb5aa22a0947fe3-0064eba496-2756d2362-default', 'content-length': '0', 'date': 'Sun, 27 Aug 2023 19:31:34 GMT', 'connection': 'Keep-Alive'}, 'RetryAttempts': 0}}

put_object response:
{'ResponseMetadata': {'RequestId': 'tx000008242b450c98fa1ec-0064eba496-2756d2362-default', 'HostId': '', 'HTTPStatusCode': 200, 'HTTPHeaders': {'content-length': '0', 'etag': '"d26a81774849c3e902f12d6dfdc6dee9"', 'accept-ranges': 'bytes', 'x-amz-request-id': 'tx000008242b450c98fa1ec-0064eba496-2756d2362-default', 'date': 'Sun, 27 Aug 2023 19:31:34 GMT', 'connection': 'Keep-Alive'}, 'RetryAttempts': 0}, 'ETag': '"d26a81774849c3e902f12d6dfdc6dee9"'}

list_objects:
demo-file1.txt

This example initializes the Boto3 client, loads the environment from .env, and shows three core operations:

  1. Create a bucket
  2. Upload a file
  3. List the contents of the bucket

This example is minimal, and does not include any error handling. If successful, the program should meet the following criteria:

  • Both create_bucket and put_object responses have a HTTPStatusCode of 200.
  • The put_object response includes an ETag value, which is the MD5 hash of the uploaded file.
  • The list_objects response lists the file in the bucket.

More operations

The Boto3 SDK supports many other operations. Here are more examples that can be used in Python after the environment is loaded and the client initialized.

Delete a file

Delete a file from a bucket.

def delete_file(bucket_name, file_name):
"""
Parameters:
- bucket_name (str): The name of the bucket.
- file_name (str): The name of the file to delete.
"""
client.delete_object(Bucket=bucket_name, Key=file_name)

Download a file

Download a file from a bucket to a local file.

def download_file(bucket_name, file_name):
"""
Parameters:
- bucket_name (str): The name of the bucket.
- file_name (str): The name of the file to download.
- file_name_local (str): The name of the local file to save to.
"""
client.download_file(bucket_name, file_name, file_name_local)

Copy a file

Copy a file from one bucket to another.

def copy_file(src_bucket, dest_bucket, file_name):
"""
Parameters:
- src_bucket (str): The name of the source bucket.
- dest_bucket (str): The name of the destination bucket.
- file_name (str): The name of the file to copy.
"""
copy_source = {
'Bucket': src_bucket,
'Key': file_name
}
client.copy_object(Bucket=dest_bucket, CopySource=copy_source, Key=file_name)

Enable versioning

Enable versioning for a bucket to keep multiple variants of an object.

def enable_versioning(bucket_name):
"""
Parameters:
- bucket_name (str): The name of the bucket.
"""
client.put_bucket_versioning(Bucket=bucket_name, VersioningConfiguration={'Status': 'Enabled'})

List bucket metadata

Retrieve information about a bucket, such as its creation date and region.

def list_bucket_metadata(bucket_name):
"""
Parameters:
- bucket_name (str): The name of the bucket.
"""
bucket_info = client.head_bucket(Bucket=bucket_name)
return bucket_info

Get object metadata

Retrieve metadata of an object, like file size or content type.

def get_object_metadata(bucket_name, file_name):
"""
Parameters:
- bucket_name (str): The name of the bucket.
- file_name (str): The name of the file.
"""
metadata = client.head_object(Bucket=bucket_name, Key=file_name)
return metadata

Using pre-signed URLs

Pre-signed URLs are a way to grant temporary access to a specific object in a bucket, without requiring permissions. These URLs can be generated programmatically and expire after a certain amount of time. They are useful for secure sharing or temporary access use-cases like uploading or downloading files.

Generate a pre-signed URL

Generate a pre-signed URL for temporary access to a private object.

def generate_presigned_url(bucket_name, object_name, expiration=3600):
"""
Parameters:
- bucket_name (str): The name of the bucket containing the object.
- object_name (str): The key name of the object.
- expiration (int): Time in seconds for the URL to expire.

Returns:
- str: A pre-signed URL allowing temporary access to the object.
"""
presigned_url = client.generate_presigned_url('get_object',
Params={'Bucket': bucket_name,
'Key': object_name},
ExpiresIn=expiration)
return presigned_url

Use a pre-signed URL for downloading

To download a file using a pre-signed URL, make a GET request to the URL:

import requests

download_response = requests.get(presigned_url)
if download_response.status_code == 200:
with open("downloaded_file.txt", 'wb') as f:
f.write(download_response.content)

Use a pre-signed URL for uploading

To generate a pre-signed URL for uploading an object, change the operation parameter to 'put_object':

presigned_url = client.generate_presigned_url('put_object',
Params={'Bucket': bucket_name,
'Key': object_name},
ExpiresIn=expiration)

Use any HTTP client to upload a file using this URL. Here's an example using Python's requests library:

import requests

with open("myfile.txt", 'rb') as f:
upload_response = requests.put(presigned_url, data=f)

Validate pre-signed URLs

To validate a pre-signed URL, perform a HEAD request using an HTTP client and examine the returned headers for any errors or inconsistencies.

import requests

head_response = requests.head(presigned_url)
print(head_response.status_code)

Error handling for pre-signed URLs

Make sure to handle errors that may occur during the upload or download process. For example, the pre-signed URL may expire or the client may not have access to the specified bucket or object.

if upload_response.status_code != 200:
print(f"Failed to upload with status code {upload_response.status_code}")

if download_response.status_code != 200:
print(f"Failed to download with status code {download_response.status_code}")

Pre-signed URLs offer a convenient way to provide short-term access permissions to objects without changing the existing policies. They are ideal for secure sharing and temporary access scenarios.

Use multi-part upload for large files

Uploading large files can be time-consuming and might fail for various reasons like network issues. Multi-part upload sends parts of an object in parallel to speed up the process.

Here’s a Python example that uses the .env file created earlier to upload a-large-file.zip with multi-part upload.

import os
import boto3
from math import ceil
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Initialize Boto3 S3 client
client = boto3.client(
's3',
aws_access_key_id=os.getenv("ACCESS_KEY_ID"),
aws_secret_access_key=os.getenv("SECRET_ACCESS_KEY"),
endpoint_url=os.getenv("ENDPOINT_URL"),
region_name="default"
)

# Bucket and file details
bucket_name = os.getenv("BUCKET_NAME")
file_path = 'a-large-file.zip'
file_size = os.path.getsize(file_path)
part_size = 5 * 1024 * 1024 # 5MB

# Step 1: Initialize the multi-part upload
response = client.create_multipart_upload(Bucket=bucket_name, Key=file_path)
upload_id = response['UploadId']
print(f'Created upload ID {upload_id}')

# Step 2: Upload the parts
parts = []
num_parts = ceil(file_size / part_size)
with open(file_path, 'rb') as f:
for i in range(1, num_parts + 1):
part = f.read(part_size)
response = client.upload_part(
Bucket=bucket_name,
Key=file_path,
PartNumber=i,
UploadId=upload_id,
Body=part
)
parts.append({'PartNumber': i, 'ETag': response['ETag']})
print(f'Uploaded part {i} of {num_parts}')

# Step 3: Complete the multi-part upload
client.complete_multipart_upload(
Bucket=bucket_name,
Key=file_path,
UploadId=upload_id,
MultipartUpload={'Parts': parts}
)
print('Completed upload')

When using multi-part upload, each part must be at least 5MB in size, except for the last part. The total number of parts can be up to 10,000.

About Object Storage endpoints

Most Boto3 operations at CoreWeave should use one these endpoints:

  • New York - LGA1: https://object.lga1.coreweave.com/
  • Chicago - ORD1: https://object.ord1.coreweave.com/
  • Las Vegas - LAS1: https://object.las1.coreweave.com/

For read-only fetching, accelerated endpoints are available. Accelerated endpoints are only accessible to clients running within CoreWeave Cloud.

  • New York - LGA1: https://accel-object.lga1.coreweave.com/
  • Chicago - ORD1: https://accel-object.ord1.coreweave.com/
  • Las Vegas - LAS1: https://accel-object.las1.coreweave.com/
caution

Accelerated endpoints should only be used to get objects.

Do not use accelerated endpoints for any operation that lists, puts, manipulates, updates, or otherwise changes objects.

More Information

For a deeper understanding of Boto3, refer to the following resources: