Stable Diffusion: Text to Image

Deploy Stable Diffusion for scalable, high fidelity, text-to-image generation on CoreWeave Cloud

The open source diffusion model Stable Diffusion built by Stability.AI takes a text prompt as input to generate high-quality, photorealistic images as output. Stability.AI also offers a UI and an API service via Dream Studio for the model.

Generate images using Stable Diffusion on CoreWeave

This tutorial deploys the Stable Diffusion model as an autoscaling Inference Service on CoreWeave Cloud, which provides an HTTP API used to receive text prompt inputs for image generation.

This image was generated from the prompt: `Red forest, digital art, trending`

Prerequisites

Tutorial source code

To follow this tutorial, first clone the tutorial source code repository from the CoreWeave GitHub.

The following tools are also required to run this tutorial:

A CoreWeave Cloud account (with Kubectl configured to use your CoreWeave kubeconfig)
kubectl

Optionally, Docker may also be used to create custom images for a model serializer and model server using the provided Dockerfile.

Tutorial files

The tutorial's source code repository contains the following files.

To deploy the tutorial using the public, pre-Tensorized model, the only required files are:

02-inference-service.yaml, and
the service directory and its contents.

Alternatively, it is possible to serialize the model yourself by leveraging the additional files provided in this tutorial's codebase.

It is also possible to create your own serializer and Inference Service images by using the provided Dockerfile. This Dockerfile utilizes CoreWeave's own PyTorch images, which ensures the final image is very lightweight.

File name	Description
`serializer`	Runs a job for custom-serializing models if desired.
`service`	Runs the Inference Service.
`00-optional-s3-secret.yaml`	Configures a custom S3 endpoint and S3 secrets for a custom Object Storage bucket, if desired.
`01-optional-s3-serialize-job.yaml`	Deploys the serialization job (`serializer`) in order to custom-serialize a model, if desired.
`02-inference-service.yaml`	Deploys the Stable Diffusion Inference Service onto CoreWeave Cloud.
`Dockerfile`	The Dockerfile used to build both the model serializer image and model serving image, if it is desirable to create your own version of these images.

(Recommended) Run the example using the pre-Tensorized model

The model used by default for this tutorial uses a publicly available model that was pre-serialized by CoreWeave's Tensorizer, a PyTorch module, model, and tensor serializer/deserializer that makes it possible to load models extremely quickly from HTTP/HTTPS and S3 endpoints. It also enables faster network load times, as well as load times from local disk volumes.

To run the example using this public, pre-Tensorized image, simply deploy the Inference Service using kubectl:

$ kubectl apply -f 02-inference-service.yaml

Then, test the Inference Service endpoint.

Run the tutorial using custom serialization

Serializing the Stable Diffusion model yourself requires an S3 endpoint and its associated secrets. Some of the provided files also need to be adjusted as described below.

Generate an S3 key

First, generate an S3 key from the Object Storage section of the CoreWeave Cloud App.

Additional Resources

See Object Storage for more information.

Create an Object Storage bucket

Next, create a new Object Storage bucket using the s3cmd tool:

$ s3cmd mb s3://YOURBUCKET

Install the S3 secrets and endpoint hostname

To install the S3 access and secret keys created earlier, first base64-encode each of the key values.

$ echo -n "<your key>" | base64"

For example:

$ echo -n "<YOUR ACCESS KEY>" | base64
QUNDRVNTX0tFWV9IRVJF

$ echo -n "<YOUR SECRET KEY>" | base64
U0VDUkVUX0tFWV9IRVJF

Then, in the 00-optional-s3-secret.yaml file, replace the access and secret key placeholders in the .data.access_key and .data.secret_key fields with your base64-encoded keys, respectively. For example:

00-optional-s3-secret.yaml
apiVersion: v1
data:
  access_key: QUNDRVNTX0tFWV9IRVJF
kind: Secret
metadata:
  name: s3-access-key
type: Opaque
---
apiVersion: v1
data:
  secret_key: U0VDUkVUX0tFWV9IRVJF
kind: Secret
metadata:
  name: s3-secret-key
type: Opaque

The S3 endpoint URL of the new S3 bucket must also be included in the 00-optional-s3-secret.yaml file.

First, base64-encode the endpoint URL. The endpoint URL should correspond to the region in which your new bucket is hosted. In this example, the ORD1 region is used, which means the hostname of the Object Storage endpoint URL is object.ord1.coreweave.com.

$ echo -n "object.ord1.coreweave.com" | base64

b2JqZWN0Lm9yZDEuY29yZXdlYXZlLmNvbQ==

Replace the host URL placeholder (.data.url) with the base64-encoded S3 endpoint URL of the new bucket. For example:

00-optional-s3-secret.yaml
apiVersion: v1
data:
  url: b2JqZWN0Lm9yZDEuY29yZXdlYXZlLmNvbQ==
  kind: Secret
metadata:
  name: s3-host-url
type: Opaque

Once these values are replaced, create the Secrets object by applying the 00-optional-s3-secret.yaml file using kubectl.

$ kubectl apply -f 00-optional-s3-secret.yaml

Serialize the model

01-optional-s3-serialize-job.yaml runs the serialization Job for the model when deployed.

Before deploying the Job, adjust the following command arguments in the 01-optional-s3-serialize-job.yaml file.

Replace the value of the command option --dest-bucket with the name of the bucket to which the model will be serialized.
Replace the value of the command option --hf-model-id with the ID of the model you would like to serialize. (By default, the model ID is set torunwayml/stable-diffusion-v1-5. Additional model IDs are available on Hugging Face.)

Once these values are added, deploy the Job using kubectl to run it:

$ kubectl apply -f 01-optional-s3-serialize-job.yaml

Run the Inference Service

To run the Inference Service, replace the model's URI in the 02-inference-service.yaml file with the S3 URI pointing to your custom model.

02-inference-service.yaml
     containers:
      - name: kfserving-container
        image: ghcr.io/coreweave/ml-containers/sd-inference:amercurio-sd-overhaul-7d29c61
        command:
        - "python3"
        - "/app/service.py"
        - "--model-uri=s3://tensorized/runwayml/stable-diffusion-v1-5"
        - "--precision=float16"
        - "--port=80"

If you are using a custom-built Inference Service image using the Dockerfile provided in the tutorial repository, additionally replace the URL in .containers.image to point to the custom image.

02-inference-service.yaml
      containers:
      - name: kfserving-container
        image: ghcr.io/coreweave/ml-containers/sd-inference:amercurio-sd-overhaul-7d29c61
        command:
        - "python3"
        - "/app/service.py"
        - "--model-uri=s3://tensorized/runwayml/stable-diffusion-v1-5"
        - "--precision=float16"
        - "--port=80"

Finally, start the Inference Service using kubectl.

$ kubectl apply -f 02-inference-service.yaml.

Test the endpoint

This example curl command can be used to test the Inference Service endpoint.

$ curl -X 'POST' \
  'https://sd.tenant-example-example.knative.ord1.coreweave.cloud/generate' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt": "a lazy cat sleeping on a pillow",
  "guidance_scale": 7,
  "num_inference_steps": 28,
  "seed": 0,
  "width": 512,
  "height": 512
}' --output cat.png

Here is an example output image from this query:

Generated from the input prompt, `a lazy cat sleeping on a pillow`.

To find the value of YOUR_KSVC, use kubectl to list all Knative services.

$ kubectl get ksvc

For example:

$ kubectl get ksvc stable-diffusion

NAME               URL                                                                           READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                        AGE
stable-diffusion   http://stable-diffusion.tenant-example-example.knative.ord1.coreweave.cloud   True           100                              stable-diffusion-predictor-default-00001   64m

Supported request parameters

The following parameters are supported for requests made to the endpoint:

guidance_scale
num_inference_steps
seed
width
height

For example:

$ curl -X 'POST' \
  'https://sd.tenant-example-example.knative.ord1.coreweave.cloud/generate' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt": "a lazy cat sleeping on a pillow",
  "guidance_scale": 7,
  "num_inference_steps": 28,
  "seed": 0,
  "width": 512,
  "height": 512
}' --output cat.png

Hardware and performance

This example is set to one A40 node for the production of higher resolution images. Inference times are around 4.78 seconds for a default resolution of 512x512 with 50 steps. Larger resolutions take longer - for example, a resolution of 1024x768 takes around 47 seconds.

note

Multi-GPU Inference is not supported.

Depending on use case, GPUs with less VRAM will also work down to 8GB GPUs, such as the Quadro RTX 4000, however output resolution will be limited by memory to 512x512.

Benchmarks

The graph and table below compare recent GPU benchmark inference speeds for Stable Diffusion processing on different GPUs. For each GPU, the values represented on the graph and table below are comparisons between inference steps per second.

GPU	Inference steps per second
Quadro RTX 4000	7.67
Quadro RTX 5000	9.86
RTX A4000	9.76
RTX A5000	12.84
RTX A6000	15.22
A40	14.66

Additional Resources

Refer to the Node Types page for more information about these GPUs.

Autoscaling

Autoscaling is controlled using annotations in the manifest for the Inference Service. By default, this example set to always run one replica, regardless of number of requests.

spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "1"
        autoscaling.knative.dev/maxScale: "1"

Increasing the value of autoscaling.knative.dev/maxScale will allow CoreWeave's infrastructure to automatically scale up the number of replicas when there are multiple outstanding requests to the endpoints. Replicas will then automatically be scaled down as demand decreases.

Scale-to-Zero

Setting minReplicas to 0 enables Scale-to-Zero, a cost-effective measure that allows the Inference Service to be completely scaled down when no requests have been made for a period of time, preventing additional charges while the Service is idle. No cost is incurred when the Service is scaled to zero.

Generate images using Stable Diffusion on CoreWeave​

Prerequisites​

Tutorial source code​

Tutorial files​

(Recommended) Run the example using the pre-Tensorized model​

Run the tutorial using custom serialization​

Generate an S3 key​

Create an Object Storage bucket​

Install the S3 secrets and endpoint hostname​

Serialize the model​

Run the Inference Service​

Test the endpoint​

Supported request parameters​

Hardware and performance​

Benchmarks​

Autoscaling​

Scale-to-Zero​

Generate images using Stable Diffusion on CoreWeave

Prerequisites

Tutorial source code

Tutorial files

(Recommended) Run the example using the pre-Tensorized model

Run the tutorial using custom serialization

Generate an S3 key

Create an Object Storage bucket

Install the S3 secrets and endpoint hostname

Serialize the model

Run the Inference Service

Test the endpoint

Supported request parameters

Hardware and performance

Benchmarks

Autoscaling

Scale-to-Zero