CoreWeave
Search
K

Stable Diffusion: Text to Image

Deploy Stable Diffusion for scalable, high fidelity, text-to-image generation on CoreWeave Cloud
The open source diffusion model Stable Diffusion built by Stability.AI takes a text prompt as input to generate high-quality, photorealistic images as output. Stability.AI also offers a UI and an API service via Dream Studio for the model.

Generate images using Stable Diffusion on CoreWeave

This tutorial deploys the Stable Diffusion model as an autoscaling Inference Service on CoreWeave Cloud, which provides an HTTP API used to receive text prompt inputs for image generation.
This image was generated from the prompt: Red forest, digital art, trending

Prerequisites

Tutorial source code

To follow this tutorial, first clone the tutorial source code repository from the CoreWeave GitHub:
The following tools are also required to run this tutorial:
Optionally, Docker may also be used to create custom images for a model serializer and model server using the provided Dockerfile.

Tutorial files

The tutorial's source code repository contains the following files.
  • 02-inference-service.yaml, and
  • the service directory and its contents.
Alternatively, it is possible to serialize the model yourself by leveraging the additional files provided in this tutorial's codebase.
It is also possible to create your own serializer and Inference Service images by using the provided Dockerfile. This Dockerfile utilizes CoreWeave's own PyTorch images, which ensures the final image is very lightweight.
File name
Description
Runs a job for custom-serializing models if desired.

service

Runs the Inference Service.
Configures a custom S3 endpoint and S3 secrets for a custom Object Storage bucket, if desired.
Deploys the serialization job (serializer) in order to custom-serialize a model, if desired.
Deploys the Stable Diffusion Inference Service onto CoreWeave Cloud.
The Dockerfile used to build both the model serializer image and model serving image, if it is desirable to create your own version of these images.
The model used by default for this tutorial uses a publicly available model that was pre-serialized by CoreWeave's Tensorizer, a PyTorch module, model, and tensor serializer/deserializer that makes it possible to load models extremely quickly from HTTP/HTTPS and S3 endpoints. It also enables faster network load times, as well as load times from local disk volumes.
To run the example using this public, pre-Tensorized image, simply deploy the Inference Service using kubectl:
$ kubectl apply -f 02-inference-service.yaml

Run the tutorial using custom serialization

Serializing the Stable Diffusion model yourself requires an S3 endpoint and its associated secrets. Some of the provided files also need to be adjusted as described below.

Generate an S3 key

Additional Resources
See Object Storage for more information.

Create an Object Storage bucket

Next, create a new Object Storage bucket using the s3cmd tool:
$ s3cmd mb s3://YOURBUCKET

Install the S3 secrets and endpoint hostname

To install the S3 access and secret keys created earlier, first base64-encode each of the key values.
$ echo -n "<your key>" | base64"
For example:
$ echo -n "<YOUR ACCESS KEY>" | base64
QUNDRVNTX0tFWV9IRVJF
$ echo -n "<YOUR SECRET KEY>" | base64
U0VDUkVUX0tFWV9IRVJF
Then, in the 00-optional-s3-secret.yaml file, replace the access and secret key placeholders in the .data.access_key and .data.secret_key fields with your base64-encoded keys, respectively. For example:
00-optional-s3-secret.yaml
apiVersion: v1
data:
access_key: QUNDRVNTX0tFWV9IRVJF
kind: Secret
metadata:
name: s3-access-key
type: Opaque
---
apiVersion: v1
data:
secret_key: U0VDUkVUX0tFWV9IRVJF
kind: Secret
metadata:
name: s3-secret-key
type: Opaque
The S3 endpoint URL of the new S3 bucket must also be included in the 00-optional-s3-secret.yaml file.
First, base64-encode the endpoint URL. The endpoint URL should correspond to the region in which your new bucket is hosted. In this example, the ORD1 region is used, which means the hostname of the Object Storage endpoint URL is object.ord1.coreweave.com.
$ echo -n "object.ord1.coreweave.com" | base64
b2JqZWN0Lm9yZDEuY29yZXdlYXZlLmNvbQ==
Replace the host URL placeholder (.data.url) with the base64-encoded S3 endpoint URL of the new bucket. For example:
00-optional-s3-secret.yaml
apiVersion: v1
data:
url: b2JqZWN0Lm9yZDEuY29yZXdlYXZlLmNvbQ==
kind: Secret
metadata:
name: s3-host-url
type: Opaque
Once these values are replaced, create the Secrets object by applying the 00-optional-s3-secret.yaml file using kubectl.
$ kubectl apply -f 00-optional-s3-secret.yaml

Serialize the model

01-optional-s3-serialize-job.yaml runs the serialization Job for the model when deployed.
Before deploying the Job, adjust the following command arguments in the 01-optional-s3-serialize-job.yaml file.
  • Replace the value of the command option --dest-bucket with the name of the bucket to which the model will be serialized.
  • Replace the value of the command option --hf-model-id with the ID of the model you would like to serialize. (By default, the model ID is set torunwayml/stable-diffusion-v1-5. Additional model IDs are available on Hugging Face.)
Once these values are added, deploy the Job using kubectl to run it:
$ kubectl apply -f 01-optional-s3-serialize-job.yaml

Run the Inference Service

To run the Inference Service, replace the model's URI in the 02-inference-service.yaml file with the S3 URI pointing to your custom model.
02-inference-service.yaml
containers:
- name: kfserving-container
image: ghcr.io/coreweave/ml-containers/sd-inference:amercurio-sd-overhaul-7d29c61
command:
- "python3"
- "/app/service.py"
- "--model-uri=s3://tensorized/runwayml/stable-diffusion-v1-5"
- "--precision=float16"
- "--port=80"
If you are using a custom-built Inference Service image using the Dockerfile provided in the tutorial repository, additionally replace the URL in .containers.image to point to the custom image.
02-inference-service.yaml
containers:
- name: kfserving-container
image: ghcr.io/coreweave/ml-containers/sd-inference:amercurio-sd-overhaul-7d29c61
command:
- "python3"
- "/app/service.py"
- "--model-uri=s3://tensorized/runwayml/stable-diffusion-v1-5"
- "--precision=float16"
- "--port=80"
Finally, start the Inference Service using kubectl.
$ kubectl apply -f 02-inference-service.yaml.

Test the endpoint

This example curl command can be used to test the Inference Service endpoint.
$ curl -X 'POST' \
'https://sd.tenant-example-example.knative.ord1.coreweave.cloud/generate' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"prompt": "a lazy cat sleeping on a pillow",
"guidance_scale": 7,
"num_inference_steps": 28,
"seed": 0,
"width": 512,
"height": 512
}' --output cat.png
Here is an example output image from this query:
Generated from the input prompt, a lazy cat sleeping on a pillow.
To find the value of YOUR_KSVC, use kubectl to list all Knative services.
$ kubectl get ksvc
For example:
$ kubectl get ksvc stable-diffusion
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
stable-diffusion http://stable-diffusion.tenant-example-example.knative.ord1.coreweave.cloud True 100 stable-diffusion-predictor-default-00001 64m

Supported request parameters

The following parameters are supported for requests made to the endpoint:
  • guidance_scale
  • num_inference_steps
  • seed
  • width
  • height
For example:
$ curl -X 'POST' \
'https://sd.tenant-example-example.knative.ord1.coreweave.cloud/generate' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"prompt": "a lazy cat sleeping on a pillow",
"guidance_scale": 7,
"num_inference_steps": 28,
"seed": 0,
"width": 512,
"height": 512
}' --output cat.png

Hardware and performance

This example is set to one A40 node for the production of higher resolution images. Inference times are around 4.78 seconds for a default resolution of 512x512 with 50 steps. Larger resolutions take longer - for example, a resolution of 1024x768 takes around 47 seconds.
Note
Multi-GPU Inference is not supported.
Depending on use case, GPUs with less VRAM will also work down to 8GB GPUs, such as the Quadro RTX 4000, however output resolution will be limited by memory to 512x512.

Benchmarks

The graph and table below compare recent GPU benchmark inference speeds for Stable Diffusion processing on different GPUs. For each GPU, the values represented on the graph and table below are comparisons between inference steps per second.
GPU
Inference steps per second
Quadro RTX 4000
7.67
Quadro RTX 5000
9.86
RTX A4000
9.76
RTX A5000
12.84
RTX A6000
15.22
A40
14.66
Additional Resources
Refer to the Node Types page for more information about these GPUs.

Autoscaling

Autoscaling is controlled using annotations in the manifest for the Inference Service. By default, this example set to always run one replica, regardless of number of requests.
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "1"
autoscaling.knative.dev/maxScale: "1"
Increasing the value of autoscaling.knative.dev/maxScale will allow CoreWeave's infrastructure to automatically scale up the number of replicas when there are multiple outstanding requests to the endpoints. Replicas will then automatically be scaled down as demand decreases.

Scale-to-Zero

Setting minReplicas to 0 enables Scale-to-Zero, a cost-effective measure that allows the Inference Service to be completely scaled down when no requests have been made for a period of time, preventing additional charges while the Service is idle. No cost is incurred when the Service is scaled to zero.