- Create a cluster in CKS.
- Create a Node Pool.
- Interact with clusters and Pods using
kubectl. - Deploy and interact with an LLM using Open WebUI.
Before you begin
Before completing the steps in this guide, you must have the following:- Access to the
Llama-3.1-8B-Instructmodel at Hugging Face. Go to meta-llama/Llama-3.1-8B-Instruct and request access. Approval for restricted models can take a few hours or longer, so request access before you start the rest of this guide. kubectlinstalled on your machine.kubectlis the command-line tool for interacting with Kubernetes clusters. If needed, see the kubectl installation instructions.- Access to the CoreWeave Cloud Console. For more information, see Activate and sign in to your CoreWeave organization.
- A Hugging Face access token. See the Hugging Face instructions at User access tokens. Be sure to copy and store the token in a secure location. You need it later in this guide.
Create a CKS cluster and Node Pool
CKS clusters and Node Pools are the core infrastructure for running and managing workloads. To create a cluster and Node Pool, complete the following steps:- Log in to the Cloud Console and navigate to the Clusters page.
- Click the Create Cluster button.
-
In the Create a Cluster dialog, give the cluster a name, select the latest Kubernetes version, and verify the box is checked for Enable access to the Kubernetes API via the Internet. Click Next.

-
Create the cluster where you have GPU quota available. Verify the box is checked for Create a default VPC, and then click Next.

-
Leave the authentication boxes unchecked and click Next.

- On the deploy page, click Submit.
- On the Success! dialog box, click Create a Node Pool.
-
Verify the cluster you just created is selected, and do the following:
- Name the Node Pool.
- Pick a GPU instance.
- Set Target Nodes to
1. - Leave all other fields empty.
- Click Submit.
Healthy, your cluster has GPU capacity ready to serve the model, and you can continue to the following steps.
Do not install the NVIDIA GPU Operator on CKS clusters
Generate a CoreWeave access token
Access tokens let you authenticate to your Kubernetes resources throughkubectl. You must create one for the cluster you just provisioned before you can run commands against it.
To create an access token, complete the following steps:
- In the Cloud Console, navigate to the Tokens page and click the Create Token button.
- Enter a name and expiration and then click Create.
-
In the Create API Token dialog, select the cluster you just created from the Select current-context dropdown menu, and then click Download.

Use kubectl with your cluster
To communicate with your cluster using kubectl, complete the following steps:
-
Make a
KUBECONFIGenvironment variable that points to thekubeconfigfile you just downloaded, for example: -
Confirm you can connect to the cluster with the following command:
You should see cluster information like the following:
Create a Hugging Face secret
For CKS to download thellama-3.1-8B-Instruct model from Hugging Face, you must create a Kubernetes secret that holds your Hugging Face access token. The model deployment in the next section reads this secret at runtime to authenticate with Hugging Face.
Complete the following steps to create the secret:
-
Run the following command to create a Hugging Face secret:
[HUGGING-FACE-TOKEN]: This is the token Hugging Face provides you. For more information about creating a Hugging Face token, see User access tokens.
Download and apply a YAML configuration file
Kubernetes uses YAML files to configure resources. The following example YAML file defines the model deployment and the Open WebUI service so you can deploy both with a single command. To deploy theLlama-3.1-8B-Instruct model using this example, complete the following steps:
-
Use
kubectlto apply the file by running the following command:Before running the command, confirm you have access to theLlama-3.1-8Bmodel. Visit themeta-llama/Llama-3.1-8B-Instructpage to verify your access. -
Confirm Kubernetes deployed the resources by running the following commands:
Verify all Pods are ready and running. The output should look like the following:
-
Verify the services are working by running the following commands:
-
[LLAMA-POD-NAME]: The Pod name beginning withllama-*thatkubectl get podsreturns. -
In the logs, look for the following line:
INFO: Application startup complete.
-
Get the Open WebUI endpoint
The Open WebUI service is not exposed to the internet. To access Open WebUI from your machine, use port-forwarding:-
Run the following command to forward local port 8080 to the Open WebUI service:
-
Leave the command running and open
http://localhost:8080in your browser.

Next steps
You’ve deployed an LLM on CKS.- For more information about CKS clusters, see Introduction to clusters.
- For more information about Node Pools, see Introduction to Node Pools.