This tutorial shows you how to deploy the Red Hat AI Inference Stack for Kubernetes on CoreWeave Kubernetes Service (CKS). The stack provides GPU-based LLM inference using llm-d, KServe, Istio, and the Gateway API so you can run and serve models such as GPT-OSS on your CKS cluster. In this tutorial, you will:Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
- Deploy the Red Hat AI Inference Stack (cert-manager, Istio, LWS operator, and KServe) on your CKS cluster.
- Create and verify the inference gateway for routing requests to models.
- Deploy a hello-world model (GPT-OSS) and send a chat completion inference request.
What you'll need
Before you start, you must have:
- A Red Hat Registry service account or Red Hat pull secret for
registry.redhat.io. - A CoreWeave Kubernetes Service (CKS) cluster with GPU nodes.
kubectlinstalled and configured to access your cluster.KUBECONFIGset to an absolute path (required by the deployment scripts).
What you'll use
You’ll use these tools:
- git: To clone the rhaii-on-xks repository.
- make: To deploy components and run validation.
- jq: To copy the Red Hat pull secret into namespaces (used in later steps).
Prerequisites
Before completing the tutorial, please confirm you have the following prerequisites completed.Cluster readiness
Your cluster is ready. Verify by checking the GPU nodes that are available:Red Hat access token
You need a Red Hat pull secret so the cluster can pull images fromregistry.redhat.io. Get a Red Hat service account by completing the following:
- Go to: https://access.redhat.com/terms-based-registry/
- Click “New Service Account”
- Create account and note the username (e.g., 12345678|myserviceaccount)
- Download the service account token on the OpenShift Secret tab
-
Convert the service account token into
json:Replace PULL-SECRET.yaml with your file name. Theauth.jsonfile should look like the following: -
Create the directory and copy
auth.jsonto~/.config/containers:
KUBECONFIG
Your$KUBECONFIG is set to an absolute path:
Clone the repository
Clone the Red Hat AI Inference Stack repository and change into its directory:Deploy prerequisites
Usemake to deploy all stack components that llm-d depends on (cert-manager, Istio, LWS operator, and KServe):
Running state and readiness checks passing:
Expected output after make status
Create the inference gateway
Deploy the inference gateway so you can route requests to your models:inference-gateway in the opendatahub namespace with an ADDRESS and PROGRAMMED set to True:
Expected gateway output
Hello, World deployment
After the gateway is running, you can deploy a model and send inference requests. This section uses the redhat-inference example from the CoreWeave doc-examples repo.Setup
Create a namespace for the deployment. Here we usellm-d-rhaii:
Download model and deploy
Clone the following repo:redhat-inference directory:
Make inference request
In a separate terminal, port-forward the inference gateway:Expected response
Next steps
- Deploy an open-source LLM on CKS for a full walkthrough of creating a cluster, node pool, and serving a model with Open WebUI.
- Deploy vLLM for inference to run another inference stack with monitoring, autoscaling, and Prometheus or Grafana.
- Observability overview to add monitoring, metrics, and logging for your inference workloads.
- Nodes and node pools to scale GPU capacity or adjust node pool configuration.
- Secrets to manage image pull secrets and other sensitive configuration in your cluster.