FastAI Sentiment
A custom predictor allows you to deploy your own prediction implementation, useful when the existing serving implementations don't fit your needs. If migrating from Cortex, the custom predictor work exactly the same way as PythonPredictor
does in Cortex. Most PythonPredictors
can be converted to custom predictor by copy pasting the code and renaming some variables.
The custom predictor is packaged as a Docker container. It is recommended, but not required, to keep large model files outside of the container image itself and to load them from a storage volume. This example follows that pattern. You will need somewhere to publish your Docker image once built. This example leverages Docker Hub, where storing public images are free and private images are cheap. Google Container Registry and other registries can also be used.
We're deploying a sentiment analyzer built with FastAI. It uses PyTorch under the hood, but working with the FastAI API is easier, which is why we wrap it instead of using a PyTorch predictor. The same principles can be applied to any framework.
Make sure you use a GPU enabled Docker image as a base, and that you enable GPU support when loading the model.
Tutorial source code
To follow along with this tutorial, first clone the manifests from CoreWeave's kubernetes-cloud
repository.
Get started
After installing kubectl
and adding your CoreWeave Cloud access credentials, clone the repository and folder for the tutorial source code.
This guide assumes use of a private container registry requiring authentication. Recommended solutions include using the Docker Registry application directly on CoreWeave Cloud, or else creating and using a Docker Hub account. Otherwise, another private container registry may work.
Once you have configured Docker Registry or created a Docker Hub account, create a new repository. For the rest of this guide, we'll assume that the name of the new repository is fastai-sentiment
. No modifications are needed to any of these files to follow along.
Build the Docker image
From the cloned source code directory, enter the custom-predictor
directory. From here, build and push the Docker image.
$export DOCKER_USER=coreweave$docker build -t $DOCKER_USER/fastai-sentiment:1$docker push $DOCKER_USER/fastai-sentiment:1
The default Docker tag is latest
. It is strongly discouraged to use this, as containers are cached on the nodes and in other parts of the CoreWeave stack. Once you have pushed to a tag, do not push to that tag again. Below, we use simple versioning by using tag 1
for the first iteration of the image.
Set up repository access
Using kubectl create secret
, create a new Secret
using your login credentials, and name it docker-repository
, or something that makes sense to you. This will be used by the nodes to pull your private image.
$kubectl create secret docker-registry docker-hub --docker-server=https://index.docker.io/v1/ --docker-username=<your-name> --docker-password=<your-pword> --docker-email=<your-email>
Tell Kubernetes to use the newly created Secret
by patching the ServiceAccount
for your namespace to reference this Secret
.
$kubectl patch serviceaccounts default --patch "$(cat image-secrets-serviceaccount.patch.yaml)"
Download the model
For performance reasons, the model is not bundled within the Docker image, so a Storage Volume is required, to which the pre-trained model will be downloaded. Storage Volumes are allocated using a Kubernetes PersistentVolumeClaim
, or PVC. A simple container will also be deployed, which can be used to copy files to the newly created Volume.
From the custom-sentiment
directory, use kubectl apply
to deploy the PersistentVolumeClaim
(model-storage-pvc.yaml
).
$kubectl apply -f model-storage-pvc.yamlpersistentvolumeclaim/model-storage created
Then, deploy the manifest for the sleep
container (sleep-deployment.yaml
).
$kubectl apply -f sleep-deployment.yamldeployment.apps/sleep created
The volume is mounted to /models
inside the sleep
container. Download the pre-trained model locally, then create a directory for it in the shared volume to upload it into. The name of the sleep Pod is assigned to a variable using kubectl
. You can also get the name with kubectl get pods
, as is done here:
$curl -O https://bstore.coreweave.com/blobstore/models/fastai-sentiment/export.pkl$export SLEEP_POD=$(kubectl get pod -l "app.kubernetes.io/name=sleep" -o jsonpath='{.items[0].metadata.name}') # Export the value of the randomly-assigned variable to the $SLEEP_POD environment variable$kubectl exec -it $SLEEP_POD -- sh -c 'mkdir /models/sentiment' # Create a new directory inside the sleep pod at /models/senitment$kubectl cp ./export.pkl $SLEEP_POD:/models/sentiment/ # Copy the locally-downloaded model into that new directory
Optional – The model may also be downloaded from CoreWeave Object Storage or from Amazon S3. The Amazon S3 CLI utilities already exist in the sleep container to allow for this.
$export SLEEP_POD=$(kubectl get pod -l "app.kubernetes.io/name=sleep" -o jsonpath='{.items[0].metadata.name}')$kubectl exec -it $SLEEP_POD -- sh$# aws configure$# mkdir /models/sentiment$# aws s3 sync --recursive s3://mybucket /models/sentiment/
Deploy the model
Modify the sentiment-inferenceservice.yaml
to reference your own Docker image under .spec.predictor.containers.image
.
spec:predictor:maxReplicas: 10minReplicas: 0containerConcurrency: 1containers:- name: kfserving-containerimage: coreweave/my-fastai-sentiment-container:1env:
Apply the resources using kubectl apply
.
$kubectl apply -f sentiment-inferenceservice.yamlinferenceservice.serving.kubeflow.org/sentiment configured
List the Pods to ensure the Predictor has launched successfully. This may take a minute - wait for the READY
column to indicate that all Pods are Running
.
$kubectl get podsNAME READY STATUS RESTARTS AGEsentiment-predictor-default-px8xk-deployment-85bb6787d7-h42xk 2/2 Running 0 34s
If the predictor fails to init, check the logs for insights as to why. For example:
$kubectl logs sentiment-predictor-default-px8xk-deployment-85bb6787d7-h42xk kfserving-container
Test the model
Once all the Pods are running, the API endpoint for the model will be accessible as a URL provided in the InferenceService
object. The API endpoints follow the Tensorflow V1 HTTP API.
$kubectl get inferenceservicesNAME URL READY DEFAULT TRAFFIC CANARY TRAFFIC AGEsentiment http://sentiment.tenant-test.knative.ord1.coreweave.cloud/v1/models/sentiment True 100 23h
The URL given in the output of kubectl get
is the public API URL for your newly deployed model.
A HTTPS endpoint is also available, however the HTTPS endpoint bypasses any canary deployments. Retrieve it using kubectl get ksvc
.
Run a test prediction by querying the acquired URL. Remember to add the :predict
postfix.
$curl -XPOST -H @sample.json http://sentiment.tenant-test.knative.ord1.coreweave.cloud/v1/models/sentiment:predict{"predictions": ["positive"]}
Clean up
Delete the InferenceService
. This will delete all the associated resources, except for your model storage and sleep Deployment.
$kubectl delete inferenceservices sentimentinferenceservice.serving.kubeflow.org "sentiment" deleted