CoreWeave
Search
⌃K

Serverless

Deploy serverless applications on CoreWeave Cloud
CoreWeave Cloud enables clients to run their own code, manage data, and integrate applications - all without ever having to manage any infrastructure.
Deploying serverless applications is an especially ideal deployment alternative when the purpose of the application is to serve HTTP or gRPC requests either internally or externally to and from the Internet.

KNative on CoreWeave

CoreWeave uses the KNative runtime to support deploying serverless applications with a single manifest, so no additional installation or configuration is necessary to deploy your applications.

Serverless benefits

🔐
Automatic public HTTPS endpoints
Never worry about managing SSL certificates for your serverless applications - with KNative and LetsEncrypt, HTTPS endpoints are automatic with every deployment.
📈
Autoscaling by default, including Scale-to-Zero
High-availability comes built-in with serverless application deployments on CoreWeave, so application resources scale automatically according to their traffic. Scaling to zero means consuming no resources, incurring no billable charges while idle.
💰
No charge for public IPs
Public IP addresses do not incur any additional costs when deploying serverless applications on CoreWeave, making public distribution of the application easy.
🧪
Advanced deployment strategies
CoreWeave's implementation of the KNative runtime supports advanced deployment strategies, including traffic splitting techniques useful for blue/green and canary deployment methods.
Serveless deployment diagram
Serveless deployment diagram

Deployment example

The following example manifest demonstrates how to deploy a simple application manifest onto CoreWeave Cloud.
apiVersion: serving.knative.dev/v1 # Current version of Knative
kind: Service
metadata:
name: helloworld # The name of the app
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "0" # Allow scale to Zero
autoscaling.knative.dev/maxScale: "10" # Maximum number of Pods allowed to auto-scale to
spec:
# The container concurrency defines how many active requests are sent to a single
# backend pod at a time. This configuration is important as it effects how well requests
# are load balanced over Pods. For a standard, non-blocking web applocation this can usually
# be rather high, ie 100. For GPU Inference however, this should usually be set to 1.
# GPU Inference only processes one request at a time, and one wants to avoid a queue being
# built up in the local pod instead of centrally in the Load Balancer.
containerConcurrency: 10
containers:
- image: gcr.io/knative-samples/helloworld-go # The URL to the image of the app
resources:
limits:
cpu: 2
memory: 4Gi
env:
- name: TARGET # The environment variable printed out by the sample app
value: "Go Sample v1"
Once the manifest is applied and the application is deployed, get the public URL of the service using kubectl get ksvc:
$ kubectl get ksvc
NAME URL LATESTCREATED LATESTREADY READY REASON
helloworld https://helloworld.default.knative.chi.coreweave.com helloworld-ngzsn helloworld-ngzsn True
Note
If the value of the URL provided does not have https, it may be that the domain is too long. Please contact your CoreWeave Support Specialist for assistance.

Monitoring

Managed Grafana monitoring provides insights into requests, success rates, response times and auto-scaling metrics transparently. No metrics-specific code needs to be added to the serverless application.
Screenshot: Grafana dashboard
Grafana dashboard
To access Grafana, log in to your CoreWeave Cloud account, then navigate to the Account Details section in the left-hand navigation menu, and click Grafana. Clicking this link will open a new window in your browser.
Screenshot: Grafana in the left-hand menu