CoreWeave
Search…
Finetuning Machine Learning Models

Introduction

Finetuning and training machine learning models can be computationally expensive. CoreWeave Cloud allows for easy on-demand compute resources to train models along with the infrastructure to support it. This guide is intended to be a reference example of how to use Argo Workflows to set up a machine learning pipeline on CoreWeave.
The reference example utilizes GPT-type transformer models with the HuggingFace Transformers library, and assumes that the model's tokenization format is BPE. The reference example is not intended to be a production application; rather it is a guide to how to utilize CoreWeave resources to set up a pipeline.
The base model being trained on can be provided directly in a PVC (PersistentVolumeClaim), or a model identifier from HuggingFace's model repository. The dataset trained upon needs to be in the same PVC, and in pure text format. It is recommended that you partition your data into separate files for easy addition and removal of subsets.
Presently, the reference example uses the following container configuration to train models on:
  • 8 vCPU (AMD EPYC usually)
  • 128GB RAM
  • Nvidia A40/A6000 (48GB VRAM)
The above configuration has been found to be pretty optimal for training a variety of GPT models from 155m to 6b parameter size on a single GPU. The above configuration is billed at $2.00/hr through CoreWeave's resource based pricing model.
There is an optional test Inference endpoint that can be enabled and deployed automatically when the model completes finetuning. The Inference container defaults to the following configuration:
  • 4 vCPU
  • 8GB RAM
  • Nvidia RTX A5000 (24GB VRAM)
This configuration will be able to do 6b models comfortably, and is less expensive than the finetuner as it requires less resources at $0.85/hr.
kubernetes-cloud/finetuner-workflow at master · coreweave/kubernetes-cloud
GitHub
Check out the code on GitHub

Setup

The following Kubernetes-based components are required to be setup:
Argo Workflows
  • PVC
    • You can create a ReadWriteMany PVC storage volume at Storage.
    • 1TB to 2TB is recommended as the model checkpoints take up a lot of space! These PVCs can be shared between multiple finetune runs. We recommend using HDD type storage, as the finetuner does not require high random I/O performance.
    • It should be noted that it is easy to increase the size of a PVC as needed.
    • The workflow expects a default PVC name of finetune-data. This name can be changed once you are more comfortable with the workflow and configure it.
    • The below YAML file can be used to setup the PVC with kubectl apply -f if you prefer:
finetune-data.yaml
1
apiVersion: v1
2
kind: PersistentVolumeClaim
3
metadata:
4
name: finetune-data
5
spec:
6
storageClassName: shared-hdd-ord1
7
accessModes:
8
- ReadWriteMany
9
resources:
10
requests:
11
storage: 2000Gi
Copied!
Example finetune-data PVC configuration
The following components are optional, but may make your interaction easier:
  • filebrowser
    • This allows you to share out and access your PVC using an easy application that lets you upload and download files and folders.
    • You can deploy the filebrowser over at the same Application Catalog that you used to deploy Argo Workflows.
    • It is recommended that the name you give this filebrowser application be very short, or you will run into SSL CNAME issues. We recommend finetune.
    • Simply select the finetune-data PVC that you created earlier. Make sure that you actually add your PVC to the filebrowser list of mounts!
    • Some people may prefer to use a Virtual Server and interact with their PVC via ssh or other mechanism. This flexibility is one of the key advantages of CoreWeave.
filebrowser application

Dataset Setup

At this point, you should have a PVC set up that you can access via filebrowser or some other mechanism. For each dataset you want to use, you should create a directory or folder and give it a meaningful name. The workflow will however default to dataset as the directory to read the finetune dataset from.
The data should be individual plaintext files in the precise format that you want the prompt and responses to come in.
For example, we have a western-romance with novels in cleaned up and normalized plaintext format, with all extra whitespace cleaned up.
western-romance dataset with text files for each novel.
The dataset will automatically be tokenized by a dataset_tokenizer component written in golang as a step in the Argo Workflow. It is quite fast, and has different options for how to partition the data.

Permissions Setup

To automatically create an InferenceService, the Argo Workflow job you submit needs special permissions. The below code block shows an example ServiceAccount and the corresponding permissions required. Copy the below into a file, inference-role.yaml.
inference-role.yaml
1
apiVersion: v1
2
kind: ServiceAccount
3
metadata:
4
name: inference
5
---
6
apiVersion: rbac.authorization.k8s.io/v1
7
kind: Role
8
metadata:
9
name: role:inference
10
rules:
11
- apiGroups:
12
- serving.kubeflow.org
13
resources:
14
- inferenceservices
15
verbs:
16
- '*'
17
- apiGroups:
18
- serving.knative.dev
19
resources:
20
- services
21
- revisions
22
verbs:
23
- '*'
24
---
25
apiVersion: rbac.authorization.k8s.io/v1
26
kind: RoleBinding
27
metadata:
28
name: rolebinding:inference-inference
29
roleRef:
30
apiGroup: rbac.authorization.k8s.io
31
kind: Role
32
name: role:inference
33
subjects:
34
- kind: ServiceAccount
35
name: inference
Copied!
Apply the permissions above by invoking kubectl apply -f inference-role.yaml.

Getting and Running the Workflow

The example code is available at GitHub, and it is recommended that you use git checkout to pull down the latest copy of the code.
It includes the following files:
  • finetune-workflow.yaml - the Argo Workflow itself
  • inference-role.yaml - the role you set up earlier in this document
  • finetune-pvc.yaml - Model storage volume as set up earlier in this document
  • finetuner/Dockerfile - if you modify the finetuner.py code, you can use this Dockerfile to build your own finetuner image
  • finetuner/finetuner.py - the simple reference example finetune training code
  • finetuner/ds_config.json - the deepspeed configuration placed in the container. It is recommended that you not modify this.
  • finetuner/requirements.txt - the Python requirements and versions; you can create a venv, but this is mainly for the Dockerfile build
For reference, a copy of the finetune-workflow.yaml is at the bottom of this document, but the GitHub repository has the authoritative version.
Assuming that you have grabbed the copy of finetune-workflow.yaml, we invoke Argo Workflows with:
argo-submit-example
1
$ argo submit finetune-workflow.yaml \
2
-p run_name=example-gpt-j-6b \
3
-p dataset=dataset \
4
-p run_inference=true \
5
-p model=EleutherAI/gpt-j-6B \
6
--serviceaccount inference
Copied!
Walking through the parameters given:
  • run_name -- The only absolutely required parameter is run_name. It is strongly recommended that it be unique, as it is what is used to name the InferenceService. Consequentially, the run_name is required to meet DNS standards.
  • dataset -- the name of the dataset directory on the PVC
  • run_inference -- this explicitly tells the Workflow that we want to run a test inference service when this is done. It is not intended to be a production service, but to demonstrate end to end and to allow you to kick the tires on the finetuned model.
  • model -- this example uses a Huggingface model identifier to pull down gpt-j-6B. This will be cached on subsequent runs on your PVC under cache.
  • --serviceaccount inference is required for run_inference to work correctly
NOTE: There are easier ways to parameterize your jobs than the command line such as:
  • Parameters file (argo submit -f, and use -p to customized further)
  • Templating using Helm Charts
  • Programmatically using the Argo Workflows API
  • Using the Argo web UI
When you submit your job, you should see a screen that looks very much like the following:
1
Name: finetune-wtd2k
2
Namespace: tenant-goosewes-1
3
ServiceAccount: inference
4
Status: Pending
5
Created: Fri Apr 22 12:50:34 -0400 (now)
6
Progress:
7
Parameters:
8
dataset: dataset
9
run_name: example-gpt-j-6b
10
run_inference: true
11
model: EleutherAI/gpt-j-6B
12
pvc: finetune-data
13
retokenize: false
14
eot_token:
15
pad_token:
16
boundary_token: \n
17
context: 2048
18
train_ratio: 0.9
19
batch_size: -1
20
force_fp16: false
21
batch_size_divisor: 1.0
22
random_seed: 42
23
learn_rate: 5e-5
24
epochs: 1
25
gradients: 5
26
zero_stage: 3
27
no_resume: false
28
logs: ./logs
29
wandb_key:
30
project_id: huggingface
31
inference_only: false
32
region: ORD1
33
tokenizer_image: ghcr.io/wbrown/gpt_bpe/dataset_tokenizer:ed439c6
34
finetuner_image: docker.io/gooseai/finetuner:rc50
35
inference_image: coreweave/ml-images:pytorch-huggingface-81d5ce1
Copied!

Observing the Argo Workflow

At this point, we can observe the job via several mechanisms, now that we have our Name of finetune-wtd2k:
  • argo watch finetune-wtd2k
    • this tells Argo that we want to watch the job as it goes through the stages of:
      • model-tokenization
      • model-finetune
      • model-inference
1
Name: finetune-wtd2k
2
Namespace: tenant-goosewes-1
3
ServiceAccount: inference
4
Status: Running
5
Conditions:
6
PodRunning True
7
Created: Fri Apr 22 11:00:15 -0400 (1 hour ago)
8
Started: Fri Apr 22 11:00:15 -0400 (1 hour ago)
9
Duration: 1 hour 54 minutes
10
Progress: 1/2
11
ResourcesDuration: 1s*(1 cpu),1s*(100Mi memory)
12
Parameters:
13
dataset: dataset
14
run_name: cassandra-gpt-j-6b-fp16
15
run_inference: true
16
model: EleutherAI/gpt-j-6B
17
zero_stage: 3
18
gradients: 5
19
force_fp16: true
20
pvc: finetune-data
21
retokenize: false
22
eot_token:
23
pad_token:
24
boundary_token: \n
25
context: 2048
26
train_ratio: 0.9
27
batch_size: -1
28
batch_size_divisor: 1.0
29
random_seed: 42
30
learn_rate: 5e-5
31
epochs: 1
32
no_resume: false
33
logs: logs
34
project_id: huggingface
35
inference_only: false
36
region: ORD1
37
tokenizer_image: ghcr.io/wbrown/gpt_bpe/dataset_tokenizer:ed439c6
38
finetuner_image: docker.io/gooseai/finetuner:rc50
39
inference_image: coreweave/ml-images:pytorch-huggingface-81d5ce11
40
41
STEP TEMPLATE PODNAME DURATION MESSAGE
42
● finetune-ckl8r main
43
├───✔ tokenizer(0) model-tokenizer finetune-ckl8r-2169410118 7s
44
└───● finetuner model-finetuner finetune-ckl8r-3837635091 1h
Copied!
  • argo logs -f finetune-wtd2k to watch the logs in real time. Pleae note that if it appears to hang on Loading the model, this is due to a bug in the terminal display code when it downloads and caches the model for the first time. You can simply kill the pod in question or the job, and resubmit it and it will display progress correctly.
1
finetune-ckl8r-2169410118: 2022/04/22 15:00:21 Newest source `/finetune-data/dataset/Alastair Reynolds - [Revelation Space] Chasm City.txt` is older than `/finetune-data/dataset-EleutherAI_gpt_j_6B-2048.tokens`, not retokenizing. Use -retokenize to force retokenization.
2
finetune-ckl8r-3837635091: RUN_NAME: cassandra-gpt-j-6b-fp16
3
finetune-ckl8r-3837635091: HOST: finetune-ckl8r-3837635091
4
finetune-ckl8r-3837635091: CUDA: 11.3
5
finetune-ckl8r-3837635091: TORCH: 1.10.0a0+git302ee7b
6
finetune-ckl8r-3837635091: TRANSFORMERS: 4.17.0
7
finetune-ckl8r-3837635091: CPU: (maxrss: 297mb F: 811,255mb) GPU: (U: 19mb F: 51,033mb T: 51,052mb) TORCH: (R: 0mb/0mb, A: 0mb/0mb)
8
finetune-ckl8r-3837635091: DATASET: /finetune-data/dataset-EleutherAI_gpt_j_6B-2048.tokens
9
finetune-ckl8r-3837635091: DATASET SIZE: 194.27mb, 101,855,232 tokens, 49,734 contexts
10
finetune-ckl8r-3837635091: TRAIN_DATASET: 44,760 examples
11
finetune-ckl8r-3837635091: VALUE_DATASET: 4,974 examples
12
finetune-ckl8r-3837635091: LAST CHECKPOINT: None
13
finetune-ckl8r-3837635091: RANDOM SEED: 42
14
finetune-ckl8r-3837635091: FORCE FP16: True
15
finetune-ckl8r-3837635091: Loading EleutherAI/gpt-j-6B
16
finetune-ckl8r-3837635091: CPU: (maxrss: 48,240mb F: 761,345mb) GPU: (U: 13,117mb F: 37,935mb T: 51,052mb) TORCH: (R: 12,228mb/12,228mb, A: 12,219mb/12,219mb)
17
finetune-ckl8r-3837635091: CPU: (maxrss: 48,240mb F: 785,595mb) GPU: (U: 13,117mb F: 37,935mb T: 51,052mb) TORCH: (R: 12,228mb/12,228mb, A: 12,219mb/12,219mb)
18
finetune-ckl8r-3837635091: Setting batch size to 4
19
finetune-ckl8r-3837635091: [2022-04-22 15:03:12,856] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl
20
finetune-ckl8r-3837635091: Using amp half precision backend
21
finetune-ckl8r-3837635091: [2022-04-22 15:03:12,863] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.1, git-hash=unknown, git-branch=unknown
22
...
23
4% 99/2238 [1:48:23<35:43:37, 60.13s/it]
24
finetune-ckl8r-3837635091: CPU: (maxrss: 114,754mb F: 674,206mb) GPU: (U: 37,289mb F: 13,763mb T: 51,052mb) TORCH: (R: 36,033mb/36,033mb, A: 22,016mb/23,665mb)
25
4% 100/2238 [1:49:59<42:08:58, 70.97s/it]{'loss': 2.6446, 'learning_rate': 5e-05, 'epoch': 0.04}
26
5% 101/2238 [1:51:00<40:18:00, 67.89s/it]
27
finetune-ckl8r-3837635091: CPU: (maxrss: 114,754mb F: 674,205mb) GPU: (U: 37,289mb F: 13,763mb T: 51,052mb) TORCH: (R: 36,033mb/36,033mb, A: 22,016mb/23,665mb)
28
5% 103/2238 [1:53:00<37:54:38, 63.92s/it]
29
finetune-ckl8r-3837635091: CPU: (maxrss: 114,754mb F: 674,205mb) GPU: (U: 37,289mb F: 13,763mb T: 51,052mb) TORCH: (R: 36,033mb/36,033mb, A: 22,016mb/23,665mb)
30
5% 105/2238 [1:55:01<36:50:48, 62.19s/it]
31
finetune-ckl8r-3837635091: CPU: (maxrss: 114,754mb F: 674,205mb) GPU: (U: 37,289mb F: 13,763mb T: 51,052mb) TORCH: (R: 36,033mb/36,033mb, A: 22,016mb/23,665mb)
Copied!
During the finetuning, it will give you the time elapsed, and expected time to complete. It will also report checkpointing and loss reporting.
You can instnatly watch a submitted workflow by using argo submit --watch when submitting
  • Access your Argo Workflow application via HTTPS to see all the finetuner jobs and check on the status.
Argo Workflows HTTPS Request

Workflow Options

This section covers what the author believes to be some of the more useful parametes. It is not intended to be a complete and exhaustive reference on all exposed parameters, as this is documented via comments in the workflow YAML file itself.
Parameter
Description
Default Value
run_name
The run name used to name artifacts and report metrics. Should be unique.
required option, no default
pvc
The PVC to use for dataset and model artifacts
finetune-data
region
The region to run the Argo jobs in. This generally should be ORD1.
ORD1
dataset
The dataset folder relative to the pvc root.
dataset
model
The model to train on. It can be a relative path to the pvc root; if it can't be found, the finetuner will attempt to download the model from Huggingface.
EleutherAI/gpt-neo-2.7B
context
Training context size in tokens. Affects the tokenization proces as well.
2048
epochs
The number of times the finetuner should train on the dataset.
1
learn_rate
How quickly the model should learn the finetune data. Too high a learn rate can be counterproductive and replace the base model's training.
5e-5
wandb_key
Strongly recommended. Use an API key from http://wandb.ai to report on finetuning metrics with nice charts.
run_inference
Whether to run the example InferenceService on the finetuned model.
false
inference_only
Do not run the tokenization or finetune. Intended to quickly run only the InferenceService on a previously trained model.
false

Artifacts and Inference

When the model completes finetuning, you should find the model artifacts under a {{pvc}}/{{run_name}}/final directory. You can download the model at this point, or you can run the InferenceService on the model.
If you followed the directions for Inference Service, and installed the KNative client, you should be able to get an URL by invoking kn service list. Services can also be listed without the KNative Client by executing kubectl get ksvc
1
NAME URL LATEST AGE CONDITIONS READY REASON
2
inference-western-predictor-default http://inference-western-predictor-default.tenant-goosewes-1.knative.chi.coreweave.com
3
inference-western-predictor-default-00007 2d21h 3 OK / 3 True
Copied!
We can run CURL to do a test query (note that this assumes you have jq installed):
1
curl http://inference-western-predictor-default.tenant-goosewes-1.knative.chi.coreweave.com/v1/models/final:predict \
2
-H 'Content-Type: application/json; charset=utf-8' \
3
--data-binary @- << EOF | jq .
4
{"parameters": {"min_length":150,
5
"max_length":200},
6
"instances": ["She danced with him in the honky-tonk"]}
7
EOF
Copied!
This should yield a result similar to:
1
% Total % Received % Xferd Average Speed Time Time Time Current
2
Dload Upload Total Spent Left Speed
3
100 935 100 828 100 107 147 19 0:00:05 0:00:05 --:--:-- 188
4
{
5
"predictions": [
6
[
7
{
8
"generated_text": "She danced with him in the honky-tonk hall as if to say, \"You got me into this mess. Now I'll get you out of it. Let's split it and go our separate ways. Maybe I'll get lucky and make you my partner.\"\nHe grinned. \"You never know. Don't let anyone stop you. But if someone tries to arrest you, let them worry about that.\"\n\"I'll do that. Now, about that money?\"\n\"Money? What money?\"\n\"The loan they paid to your uncle to buy your brother out of that mine. I'm not sure why they did that.\"\nHe grinned. \"That's what I've been trying to figure out myself. They want more power over the land they're buying so they can put up cattle. But they're not taking long to figure out that I'm onto them, so keep this money safe until we figure out the best way to handle it. Don't try"
9
}
10
]
11
]
12
}
Copied!
And now we have it -- taking a model and dataset through the tokenization and finetuning process to doing test inferences against the new model. This barely scratches the surface of finetuning, but CoreWeave hopes that this helps you get started.