Get Started with Workflows
Use Argo Workflows to run jobs in parallel
Workflows on CoreWeave Cloud run on Argo Workflows, which is a great tool to orchestrate parallel execution of GPU and CPU jobs. With Workflows, you can manage retries and parallelism automatically. Workflows can also be submitted via CLI, Rest API and the Kubernetes API.

A view of the Argo Web UI
%20(2)%20(1).png?alt=media)
The Catalog link on the Cloud UI
A new window will open onto the Catalog, where you can browse all available applications. In the search field, type
argo-workflows
. Then, select the argo-workflows application once it appears:
argo-workflows application in the Catalog
In the upper right-hand corner of the following screen, select the latest version of the Helm chart under Chart Version, then click the Deploy button.
.png?alt=media)
Chart version selection and deploy button
The following deployment form will prompt you to enter a name for the application. The remaining parameters will be set to CoreWeave's suggested defaults, but can be changed to suit your requirements. When ready to deploy, click the Deploy button at the bottom of the page.
Important
The server
authentication mode does not require credentials and is strongly discouraged. We suggest using theclient
mode for better security.%20(6).png?alt=media)
The Argo Workflows configuration screen
After a few minutes, the deployment will be ready. If you selected
Expose UI via public Ingress
, Argo Workflows will be accessible outside the cluster.Click the ingress link to open the Argo Workflows UI in a new window.
.png?alt=media)
Important
It can take up to five minutes for a TLS certificate to be issued. Prior to this, you may encounter a TLS certificate error, but once the certificate is issued, this error should disappear on its own.
To run a sample workflow:
- 1.Click +SUBMIT NEW WORKFLOW, then Edit using workflow options.
- 2.Click +CREATE. After a few minutes, once successful, the workflow indication color will change to green.
Note
The following assumes you have already obtained CoreWeave Cloud access credentials and set up your kubeconfig file.
The Argo CLI tool can be obtained from the Argo Project's website. Then, to run an example workflow, save an example workflow into the file
gpu-say-workflow.yaml
.Example
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: gpu-say
spec:
entrypoint: main
activeDeadlineSeconds: 300 # Cancel operation if not finished in 5 minutes
ttlStrategy:
secondsAfterCompletion: 86400 # Clean out old workflows after a day
# Parameters can be passed/overridden via the argo CLI.
# To override the printed message, run `argo submit` with the -p option:
# $ argo submit examples/arguments-parameters.yaml -p messages='["CoreWeave", "Is", "Fun"]'
arguments:
parameters:
- name: messages
value: '["Argo", "Is", "Awesome"]'
templates:
- name: main
steps:
- - name: echo
template: gpu-echo
arguments:
parameters:
- name: message
value: "{{item}}"
withParam: "{{workflow.parameters.messages}}"
- name: gpu-echo
inputs:
parameters:
- name: message
retryStrategy:
limit: 1
script:
image: nvidia/cuda:11.4.1-runtime-ubuntu20.04
command: [bash]
source: |
nvidia-smi
echo "Input was: {{inputs.parameters.message}}"
resources:
requests:
memory: 128Mi
cpu: 500m # Half a core
limits:
nvidia.com/gpu: 1 # Allocate one GPU
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
# This will REQUIRE the Pod to be run on a system with a GPU with 8 or 16GB VRAM
nodeSelectorTerms:
- matchExpressions:
- key: gpu.nvidia.com/vram
operator: In
values:
- "8"
- "16"
Submit the new workflow file (
gpu-say-workflow.yaml
). According to the specifications above, this workflow takes a JSON array to spin up Pods with one GPU allocated for each, in parallel. The nvidia-smi
output, as well as the parameter entry assigned for that Pod, is outputted to the log:$ argo submit --watch gpu-say-workflow.yaml -p messages='["Argo", "Is", "Awesome"]'
Name: gpu-sayzfwxc
Namespace: tenant-test
ServiceAccount: default
Status: Running
Created: Mon Feb 10 19:31:17 -0500 (15 seconds ago)
Started: Mon Feb 10 19:31:17 -0500 (15 seconds ago)
Duration: 15 seconds
Parameters:
messages: ["Argo", "Is", "Awesome"]
STEP PODNAME DURATION MESSAGE
● gpu-sayzfwxc (main)
└-·-✔ echo(0:Argo)(0) (gpu-echo) gpu-sayzfwxc-391007373 10s
├-● echo(1:Is)(0) (gpu-echo) gpu-sayzfwxc-3501791705 15s
└-✔ echo(2:Awesome)(0) (gpu-echo) gpu-sayzfwxc-3986963301 12s
To print the log output from all parallel containers, use the
logs
command:$ argo logs -w gpu-sayrbr6z
echo(0:Argo)(0): Tue Feb 11 00:25:30 2020
echo(0:Argo)(0): +-----------------------------------------------------------------------------+
echo(0:Argo)(0): | NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 |
echo(0:Argo)(0): |-------------------------------+----------------------+----------------------+
echo(0:Argo)(0): | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
echo(0:Argo)(0): | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
echo(0:Argo)(0): |===============================+======================+======================|
echo(0:Argo)(0): | 0 NVIDIA Graphics... Off | 00000000:08:00.0 Off | N/A |
echo(0:Argo)(0): | 28% 51C P5 16W / 180W | 18MiB / 8119MiB | 0% Default |
echo(0:Argo)(0): +-------------------------------+----------------------+----------------------+
echo(0:Argo)(0):
echo(0:Argo)(0): +-----------------------------------------------------------------------------+
echo(0:Argo)(0): | Processes: GPU Memory |
echo(0:Argo)(0): | GPU PID Type Process name Usage |
echo(0:Argo)(0): |=============================================================================|
echo(0:Argo)(0): +-----------------------------------------------------------------------------+
echo(0:Argo)(0): Input was: Argo
echo(1:Is)(0): Tue Feb 11 00:25:30 2020
echo(1:Is)(0): +-----------------------------------------------------------------------------+
echo(1:Is)(0): | NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 |
echo(1:Is)(0): |-------------------------------+----------------------+----------------------+
...
Argo requires a Service Account token for authentication. The following steps are best practices based on Argo's access token creation.
Warning
server
auth mode is strongly discouraged, as it opens up Argo Workflows to public access, and is therefore a security risk. To tailor permissions more granularly, first create a role with minimal permissions:
$ kubectl create role argo-role --verb=list,update --resource=workflows.argoproj.io
To get full list of resources, issue the
api-resources
command:$ kubectl api-resources --api-group=argoproj.io --namespaced=true -o wide
with the following parameters:
Name | Shortnames | Kind | Verbs |
---|---|---|---|
cronworkflows | cwf,cronwf | CronWorkflow | [delete deletecollection get list patch create update watch] |
workfloweventbindings | wfeb | WorkflowEventBinding | [delete deletecollection get list patch create update watch] |
workflows | wf | Workflow | [delete deletecollection get list patch create update watch] |
workflowtaskresults | | WorkflowTaskResult | [delete deletecollection get list patch create update watch] |
workflowtasksets | wfts | WorkflowTaskSet | [delete deletecollection get list patch create update watch] |
workflowtemplates | wftmpl | WorkflowTemplate | [delete deletecollection get list patch create update watch] |
Using the command line to create a tailored role with the many resources and verbs can be inefficient. It's also possible to set permissions using a YAML manifest instead:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: argo-role
rules:
- apiGroups:
- argoproj.io
resources:
- workflows
verbs:
- list
- update
Then, apply the manifest using
kubectl apply
:$ kubectl apply -f <manifest_roles.yaml>
To create a unique Service Account, use
kubectl create sa
.For example, here we create a Service Account with the name
argo-sa
:$ kubectl create sa argo-sa
Then, create a rolebinding for the new Service Account, bound to the argo-role created:
$ kubectl create rolebinding argo-role-binding --role=argo-role --serviceaccount=<namespace>:argo-sa
where
namespace
is the namespace in which Argo is running.Generate the token to be used with the Service Account:
Linux
Powershell
$ export SECRET=$(kubectl get sa argo-sa -o=jsonpath='{.secrets[0].name}') && export ARGO_TOKEN="Bearer $(kubectl get secret $SECRET -o=jsonpath='{.data.token}' | base64 --decode)" && echo $ARGO_TOKEN
Write-Host "Bearer $([Text.Encoding]::Utf8.GetString([Convert]::FromBase64String($(kubectl get secret $(kubectl get sa {{ .Release.Name }}-argo-client -o=jsonpath='{.secrets[0].name}') -o=jsonpath='{.data.token}'))))"
Then, inside the box for client authentication, copy and paste the newly generated token into the Argo UI:
%20(2).png?alt=media)
The Argo Workflow UI with a Bearer token pasted into the client authentication box
Finally, to log in, click the Login button after adding the token.
We recommend the following retry strategy on your workflow / steps.
retryStrategy:
limit: 2
retryPolicy: Always
backoff:
duration: "15s"
factor: 2
affinity:
nodeAntiAffinity: {}
We also recommend setting an
activeDeadlineSeconds
on each step
, but not on the entire workflow. This allows a step to retry but prevents it from taking unreasonably long time to finish.Last modified 24d ago