Use the Boto3 SDK with CoreWeave Object Storage
The Boto3 SDK provides a simple, intuitive interface for performing common operations on S3-compatible object storage like creating buckets, uploading files, listing objects, and deleting files.
This guide is for Python developers who want to use CoreWeave Object Storage with the Boto3 SDK. It demonstrates how to set up your environment, initialize the Boto3 client, perform basic operations, generate pre-signed URLs, and use multi-part uploads for large files.
Boto3 is open-source, actively maintained by AWS, fully compatible with CoreWeave Object Storage, and a key library used by CoreWeave Tensorizer to interact with Object Storage. Tensorizer provides extremely fast model loads from HTTP/HTTPS and S3 endpoints.
Prerequisites
Before following the examples in this guide, you should have:
- An Access Key and a Secret Key for CoreWeave Object Storage
- Python version 3.8 or later installed on a macOS or Linux workstation
- Basic Python development experience and familiarity with the command line
Configure the Boto3 environment
On your local workstation, create a project folder and initialize a virtual environment:
$mkdir my-project$cd my-project$python3 -m venv venv$source venv/bin/activate
Install the boto3
SDK and the python-dotenv
library to store your credentials and environment.
$pip install boto3 python-dotenv
Create a .env
file in the project directory to store your object storage credentials and configuration.
$nano .env
Paste the following, replacing the values with the token credentials and endpoint URL from the Web UI's Object Storage section. Choose a unique bucket name, which must follow the same constraints as domain names. It must be less than 63 characters, be unique across all of CoreWeave Object Storage, begin and end with a lowercase letter, and consist of only lowercase letters, numbers, and hyphens.
ACCESS_KEY_ID=<your_access_key_id>SECRET_ACCESS_KEY=<your_secret_access_key>ENDPOINT_URL=<your_endpoint_url>BUCKET_NAME=<your_unique_bucket_name>
Save the file and exit.
Demonstration
Create a dummy file to upload to Object Storage.
$touch demo-file1.txt
Create an example Python program in the project directory.
$nano demo-boto3.py
Paste the following code into the new file:
import boto3import osfrom dotenv import load_dotenv# Load environment variables from .env fileload_dotenv()# Initialize Boto3 S3 clientclient = boto3.client('s3',aws_access_key_id=os.getenv("ACCESS_KEY_ID"),aws_secret_access_key=os.getenv("SECRET_ACCESS_KEY"),endpoint_url=os.getenv("ENDPOINT_URL"),region_name="default")# Create the bucketresponse = client.create_bucket(Bucket=os.getenv("BUCKET_NAME"))print('create_bucket response:')print(response)# Upload a file to the bucketwith open('demo-file1.txt', 'rb') as file:response = client.put_object(Bucket=os.getenv("BUCKET_NAME"),Key='demo-file1.txt',Body=file,ACL='private')print('put_object response:')print(response)# List all files in the bucketprint('Files in bucket:')for key in client.list_objects(Bucket=os.getenv("BUCKET_NAME"))['Contents']:print(key['Key'])
Save the file, then run the demonstration program.
$python demo-boto3.py
Example output
create_bucket response:{'ResponseMetadata': {'RequestId': 'tx000005fb5aa22a0947fe3-0064eba496-2756d2362-default', 'HostId': '', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-request-id': 'tx000005fb5aa22a0947fe3-0064eba496-2756d2362-default', 'content-length': '0', 'date': 'Sun, 27 Aug 2023 19:31:34 GMT', 'connection': 'Keep-Alive'}, 'RetryAttempts': 0}}put_object response:{'ResponseMetadata': {'RequestId': 'tx000008242b450c98fa1ec-0064eba496-2756d2362-default', 'HostId': '', 'HTTPStatusCode': 200, 'HTTPHeaders': {'content-length': '0', 'etag': '"d26a81774849c3e902f12d6dfdc6dee9"', 'accept-ranges': 'bytes', 'x-amz-request-id': 'tx000008242b450c98fa1ec-0064eba496-2756d2362-default', 'date': 'Sun, 27 Aug 2023 19:31:34 GMT', 'connection': 'Keep-Alive'}, 'RetryAttempts': 0}, 'ETag': '"d26a81774849c3e902f12d6dfdc6dee9"'}list_objects:demo-file1.txt
This example initializes the Boto3 client, loads the environment from .env
, and shows three core operations:
- Create a bucket
- Upload a file
- List the contents of the bucket
This example is minimal, and does not include any error handling. If successful, the program should meet the following criteria:
- Both
create_bucket
andput_object
responses have aHTTPStatusCode
of200
. - The
put_object
response includes anETag
value, which is the MD5 hash of the uploaded file. - The
list_objects
response lists the file in the bucket.
More operations
The Boto3 SDK supports many other operations. Here are more examples that can be used in Python after the environment is loaded and the client initialized.
Delete a file
Delete a file from a bucket.
def delete_file(bucket_name, file_name):"""Parameters:- bucket_name (str): The name of the bucket.- file_name (str): The name of the file to delete."""client.delete_object(Bucket=bucket_name, Key=file_name)
Download a file
Download a file from a bucket to a local file.
def download_file(bucket_name, file_name):"""Parameters:- bucket_name (str): The name of the bucket.- file_name (str): The name of the file to download.- file_name_local (str): The name of the local file to save to."""client.download_file(bucket_name, file_name, file_name_local)
Copy a file
Copy a file from one bucket to another.
def copy_file(src_bucket, dest_bucket, file_name):"""Parameters:- src_bucket (str): The name of the source bucket.- dest_bucket (str): The name of the destination bucket.- file_name (str): The name of the file to copy."""copy_source = {'Bucket': src_bucket,'Key': file_name}client.copy_object(Bucket=dest_bucket, CopySource=copy_source, Key=file_name)
Enable versioning
Enable versioning for a bucket to keep multiple variants of an object.
def enable_versioning(bucket_name):"""Parameters:- bucket_name (str): The name of the bucket."""client.put_bucket_versioning(Bucket=bucket_name, VersioningConfiguration={'Status': 'Enabled'})
List bucket metadata
Retrieve information about a bucket, such as its creation date and region.
def list_bucket_metadata(bucket_name):"""Parameters:- bucket_name (str): The name of the bucket."""bucket_info = client.head_bucket(Bucket=bucket_name)return bucket_info
Get object metadata
Retrieve metadata of an object, like file size or content type.
def get_object_metadata(bucket_name, file_name):"""Parameters:- bucket_name (str): The name of the bucket.- file_name (str): The name of the file."""metadata = client.head_object(Bucket=bucket_name, Key=file_name)return metadata
Using pre-signed URLs
Pre-signed URLs are a way to grant temporary access to a specific object in a bucket, without requiring permissions. These URLs can be generated programmatically and expire after a certain amount of time. They are useful for secure sharing or temporary access use-cases like uploading or downloading files.
Generate a pre-signed URL
Generate a pre-signed URL for temporary access to a private object.
def generate_presigned_url(bucket_name, object_name, expiration=3600):"""Parameters:- bucket_name (str): The name of the bucket containing the object.- object_name (str): The key name of the object.- expiration (int): Time in seconds for the URL to expire.Returns:- str: A pre-signed URL allowing temporary access to the object."""presigned_url = client.generate_presigned_url('get_object',Params={'Bucket': bucket_name,'Key': object_name},ExpiresIn=expiration)return presigned_url
Use a pre-signed URL for downloading
To download a file using a pre-signed URL, make a GET request to the URL:
import requestsdownload_response = requests.get(presigned_url)if download_response.status_code == 200:with open("downloaded_file.txt", 'wb') as f:f.write(download_response.content)
Use a pre-signed URL for uploading
To generate a pre-signed URL for uploading an object, change the operation parameter to 'put_object'
:
presigned_url = client.generate_presigned_url('put_object',Params={'Bucket': bucket_name,'Key': object_name},ExpiresIn=expiration)
Use any HTTP client to upload a file using this URL. Here's an example using Python's requests
library:
import requestswith open("myfile.txt", 'rb') as f:upload_response = requests.put(presigned_url, data=f)
Validate pre-signed URLs
To validate a pre-signed URL, perform a HEAD request using an HTTP client and examine the returned headers for any errors or inconsistencies.
import requestshead_response = requests.head(presigned_url)print(head_response.status_code)
Error handling for pre-signed URLs
Make sure to handle errors that may occur during the upload or download process. For example, the pre-signed URL may expire or the client may not have access to the specified bucket or object.
if upload_response.status_code != 200:print(f"Failed to upload with status code {upload_response.status_code}")if download_response.status_code != 200:print(f"Failed to download with status code {download_response.status_code}")
Pre-signed URLs offer a convenient way to provide short-term access permissions to objects without changing the existing policies. They are ideal for secure sharing and temporary access scenarios.
Use multi-part upload for large files
Uploading large files can be time-consuming and might fail for various reasons like network issues. Multi-part upload sends parts of an object in parallel to speed up the process.
Here's a Python example that uses the .env
file created earlier to upload a-large-file.zip
with multi-part upload.
import osimport boto3from math import ceilfrom dotenv import load_dotenv# Load environment variables from .env fileload_dotenv()# Initialize Boto3 S3 clientclient = boto3.client('s3',aws_access_key_id=os.getenv("ACCESS_KEY_ID"),aws_secret_access_key=os.getenv("SECRET_ACCESS_KEY"),endpoint_url=os.getenv("ENDPOINT_URL"),region_name="default")# Bucket and file detailsbucket_name = os.getenv("BUCKET_NAME")file_path = 'a-large-file.zip'file_size = os.path.getsize(file_path)part_size = 5 * 1024 * 1024 # 5MB# Step 1: Initialize the multi-part uploadresponse = client.create_multipart_upload(Bucket=bucket_name, Key=file_path)upload_id = response['UploadId']print(f'Created upload ID {upload_id}')# Step 2: Upload the partsparts = []num_parts = ceil(file_size / part_size)with open(file_path, 'rb') as f:for i in range(1, num_parts + 1):part = f.read(part_size)response = client.upload_part(Bucket=bucket_name,Key=file_path,PartNumber=i,UploadId=upload_id,Body=part)parts.append({'PartNumber': i, 'ETag': response['ETag']})print(f'Uploaded part {i} of {num_parts}')# Step 3: Complete the multi-part uploadclient.complete_multipart_upload(Bucket=bucket_name,Key=file_path,UploadId=upload_id,MultipartUpload={'Parts': parts})print('Completed upload')
When using multi-part upload, each part must be at least 5MB in size, except for the last part. The total number of parts can be up to 10,000.
About Object Storage endpoints
Most Boto3 operations at CoreWeave should use one these endpoints:
- New York - LGA1:
https://object.lga1.coreweave.com/
- Chicago - ORD1:
https://object.ord1.coreweave.com/
- Las Vegas - LAS1:
https://object.las1.coreweave.com/
For read-only fetching, accelerated endpoints are available. Accelerated endpoints are only accessible to clients running within CoreWeave Cloud.
- New York - LGA1:
https://accel-object.lga1.coreweave.com/
- Chicago - ORD1:
https://accel-object.ord1.coreweave.com/
- Las Vegas - LAS1:
https://accel-object.las1.coreweave.com/
Accelerated endpoints should only be used to get objects.
Do not use accelerated endpoints for any operation that lists, puts, manipulates, updates, or otherwise changes objects.
More Information
For a deeper understanding of Boto3, refer to the following resources: