Fine-tune Stable Diffusion Models with CoreWeave Cloud
Fine-tune and train Stable Diffusion models using Argo Workflows
Fine-tuning and training Stable Diffusion can be computationally expensive, but CoreWeave Cloud allows you to train Stable Diffusion models with on-demand compute resources and infrastructure that scale down to zero active pods, incurring no charges, after training is complete.
This guide is a reference example of how to use an Argo Workflow to create a pipeline at CoreWeave to fine-tune and train Stable Diffusion models. It's a working demonstration to get you started, but it's not intended to be a production application.
This article covers both DreamBooth and Textual Inversion training methods. Most of the steps are the same for both methods. But, when they are different, we used tabbed sections to indicate which training method applies.
Prerequisites
This guide contains all the information required to train Stable Diffusion, but assumes that you have already followed the process to set up the CoreWeave Kubernetes environment. If you have not done so already, follow the steps in Cloud Account and Access before proceeding.
It also assumes you are familiar with the topics covered in these articles.
Resources
Hardware
This reference example uses the following optimal container configuration for training Stable Diffusion models, but you can use any configuration you wish, as long as it meets the minimum requirements. This configuration is currently $1.52 per hour using CoreWeave's resource based pricing model.
- 8 vCPU (AMD EPYC)
- 32GB RAM
- NVIDIA A40/A6000 GPUs (48GB VRAM)
There is an optional test Inference endpoint that can be enabled and deployed automatically when the model completes fine-tuning. This Inference container defaults to the following configuration, which currently costs $0.65 per hour with resource based pricing.
- 4 vCPU
- 8GB RAM
- NVIDIA Quadro RTX 5000 (16GB VRAM)
GitHub repository
To follow this guide, clone the latest version of the CoreWeave kubernetes-cloud repository and navigate to the project directory for your preferred fine-tuning method:
- DreamBooth fine-tuning templates are in kubernetes-cloud/sd-dreambooth-workflow
- Textual Inversion fine-tuning templates are in kubernetes-cloud/sd-finetuner-workflow
Understanding the Argo Workflows
Each of the Argo Workflow templates used in the examples have a similar structure. They consist of three important sections:
- Workflow parameters
- Main template
- Individual step templates
Throughout the file, you will see many template tags, surrounded by double braces: {{
and }}
. Many of these are simple variable substitutions using workflow and step parameters. Expression template tags that start with {{=
contain expr code.
Parameters
All of the Workflow parameters and their default values are defined at the top of the workflow templates, and cover the following categories:
- Fine-tuning hyperparameters
- File paths
- Workflow step controls
- Container images
All the parameters have suitable defaults, but make sure to review them and adjust according to your needs.
Main template
The workflow is defined by a main template that lists the steps in the order they should be run. Each of the steps have the parameters defined and some include a when
value which tells the workflow when the step should be skipped.
Step templates
The step templates define how the job will be run, including the container image, resource requests, command, etc.
The step that creates the inference service is different because it applies manifest to the cluster instead of running a custom job. This manifest defines the inference service. For this reason, the guide will ask you to create a service account with permission to create inference services. The workflow will then use this service account to apply the manifest.
The inference service step has custom-defined success and failure conditions. Without these, Argo will mark the step as succeeded as soon as it successfully applies the manifest. By using these custom conditions, Argo will monitor the new inference service and only consider the step complete after the inference service start successfully. This makes it easy to run additional steps afterwards using the inference service, like creating batches of images.
Triggering the Argo Workflows
This guide offers two ways to deploy everything needed to trigger the workflows.
The first is through the Argo Workflow UI. From there, you can see all of the deployed workflow templates. Clicking on one will allow you to submit a new run after editing all of the parameter's default values.
The second is through the Argo Rest API's /api/v1/events/<namespace>/<discriminator>
endpoint. The discriminator will be defined in a WorkflowEventBinding
deployed alongside each WorkflowTemplate
.
You can view all of the available endpoints in your Argo Workflows deployment by clicking on the API Docs button in the sidebar of the UI.
About the fine-tuning methods
This guide explains how to deploy an Argo Workflow to fine-tune a Stable Diffusion base model on a custom dataset, then use the fine-tuned model in an inference service.
The base model being trained on can be provided directly in a PVC (PersistentVolumeClaim), or in a Stable Diffusion model identifier from Hugging Face's model repository. The dataset trained upon needs to be in the same PVC in text and image format.
As described earlier, you can choose one of two different methods to train the base model, either DreamBooth or Textual Inversion. Here's a short orientation of each before getting started.
DreamBooth method
The DreamBooth method allows you to fine-tune Stable Diffusion on a small number of examples to produce images containing a specific object or person. This method for fine-tuning diffusion models was introduced in a paper publish in 2022, DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. A lighter introductory text was also released along with the paper in this blog post.
To summarize, the DreamBooth method is a way to teach a diffusion model about a specific object or style using approximately three to five example images. After the model is fine-tuned on a specific object using DreamBooth, it can produce images containing that object in new settings.
The DreamBooth method uses "Prior Preservation Loss", which means class-specific loss is combined with the loss from your custom dataset. For example, when using the DreamBooth method to teach the model about a specific dog, the model will also be fine tuned against generic images of dogs. This helps prevent the model from forgetting what normal dogs look like.
In the paper a special token, "sks", is used in the prompts for the custom dataset. It is not necessary to use a special token like "sks", but it allows you to use this token in inference prompts to create images containing the dog in the custom dataset. The "sks" token was chosen because it appears very rarely in the data used to train the text encoder.
Textual Inversion method
The Textual Inversion training method captures new concepts from a small number of example images and associates the concepts with words from the pipeline's text encoder. The model then uses these words and concepts to create images from text prompts with fine-grained control. Textual Inversion was introduced in the 2022 paper, An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion.
The Textual Inversion examples in this guide allow you to fine-tune Stable Diffusion with your own dataset using the same technique used for pre-training.
Example templates
The GitHub repository for this guide has template files for both training methods. Refer to the tables below to learn about each file.
- Dreambooth
- Textual Inversion
DreamBooth Templates
Filename | Description |
---|---|
db-workflow-template.yaml | The Argo Workflow Template itself. |
db-workflow-event-binding.yaml | The Event Binding used to trigger the Workflow via an API call. |
inference-role.yaml | The inference role you configured earlier. |
db-finetune-pvc.yaml | The model storage volume described earlier. |
huggingface-secret.yaml | The Hugging Face token used to download a base model. |
wandb-secret.yaml | The Weights and Biases token used for reporting during fine-tuning. |
Textual Inversion Templates
Filename | Description |
---|---|
sd-finetune-workflow-template.yaml | The Argo Workflow Template itself |
sd-finetune-workflow-event-binding.yaml | The Event Binding used to trigger the Workflow via an API call. |
inference-role.yaml | The inference role you configured earlier in this demo |
sd-finetune-pvc.yaml | A model storage volume, as described earlier in this demo |
sd-finetuner/Dockerfile | A Dockerfile that can be used to build your own fine-tuner image, should you modify the fine-tuner code |
sd-finetuner/finetuner.py | The entry point for the Stable Diffusion fine-tuner. |
sd-finetuner/datasets.py | Script that contains the functionality to handle different dataset formats (i.e. DreamBooth vs textual inversion) |
sd-finetuner/requirements.txt | The Python requirements which list the dependencies for the fine-tuner |
huggingface-secret.yaml | The Hugging Face token used to download a base model. |
wandb-secret.yaml | The Weights and Biases token used for reporting during finetuning. |
Required components
The following Kubernetes-based components are required for this guide. Deploy each of them before proceeding to the database setup step.
Argo Workflows
Deploy Argo Workflows using the Application Catalog.
From the application deployment menu, click on the Catalog tab, then search for argo-workflows
to find and deploy the application.
PVC
Create a ReadWriteMany
PVC storage volume from the Storage menu. By default, this workflow uses a specific PVC depending on your fine-tune method:
- The DreamBooth PVC name should be
db-finetune-data
- The Textual Inversion PVC name should be
sd-finetune-data
This name can be changed in the configuration after you are familiar with the workflow.
1TB
to 2TB
is recommended for training Stable Diffusion models, depending on the size of the dataset and how many fine-tunes you wish to run. Later, if you later require more space, it's easy to increase the size of the PVC as needed.
The PVC can be shared between multiple fine-tune runs. We recommend using HDD type storage, because the fine-tuner does not require high performance storage.
If you prefer, you can also deploy the PVC with the YAML snippet for your method below, then use kubectl apply -f
to apply it.