PythonPredictordoes in Cortex. Most
PythonPredictorscan be converted to custom predictor by copy pasting the code and renaming some variables.
kubectland adding your CoreWeave Cloud access credentials, the following steps will deploy the Inference Service. Clone this repository and folder, and execute all commands in there. We'll be using all the files.
custom-predictordirectory. Build and push the Docker image. No modifications are needed to any of the files to follow along. The Docker image can be quite resource intensive to build, as it rebuilds the NVIDIA Apex library with fp16 support. The default Docker tag is
latest. We strongly discourage you to use this, as containers are cached on the nodes and in other parts of the CoreWeave stack. Once you have pushed to a tag, do not push to that tag again. Below, we use simple versioning by using tag
1for the first iteration of the image.
Secretwith the Docker Hub credentials. The secret will be named
docker-hub. This will be used by nodes to pull your private image. Refer to the Kubernetes Documentation for more details.
Secretby patching the
ServiceAccountfor your namespace to reference this
aitextgen-inferenceservice.yamlto reference your docker image.
kubectl logs aitextgen-predictor-default-px8xk-deployment-85bb6787d7-h42xk kfserving-container.
kubectl get ksvc.
InferenceService. This will delete all the associated resources, except for your model storage and sleep Deployment.