Help us improve CoreWeave documentation. Take the docs survey.
curl --request POST \
--url https://api.coreweave.com/v1alpha1/inference/deployments \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"name": "<string>",
"gatewayIds": [
"<string>"
],
"runtime": {
"engine": "<string>",
"version": "<string>",
"engineConfig": {}
},
"resources": {
"instanceType": "<string>",
"gpuCount": 123
},
"model": {
"name": "<string>",
"bucket": "<string>",
"path": "<string>"
},
"autoscaling": {
"min": 123,
"max": 123,
"priority": 123,
"concurrency": 123,
"capacityClasses": [
"CAPACITY_CLASS_RESERVED"
]
},
"traffic": {
"weight": 123
},
"id": "<string>",
"disabled": true
}
'{
"deployment": {
"spec": {
"id": "<string>",
"name": "<string>",
"gatewayIds": [
"<string>"
],
"runtime": {
"engine": "<string>",
"version": "<string>",
"engineConfig": {}
},
"resources": {
"instanceType": "<string>",
"gpuCount": 123
},
"model": {
"name": "<string>",
"bucket": "<string>",
"path": "<string>"
},
"autoscaling": {
"min": 123,
"max": 123,
"priority": 123,
"concurrency": 123,
"capacityClasses": [
"CAPACITY_CLASS_RESERVED"
]
},
"traffic": {
"weight": 123
},
"organizationId": "<string>",
"disabled": true
},
"status": {
"createdAt": "2023-11-07T05:31:56Z",
"updatedAt": "2023-11-07T05:31:56Z",
"conditions": [
{
"type": "<string>",
"lastUpdateTime": "2023-11-07T05:31:56Z",
"reason": "<string>",
"message": "<string>",
"zone": "<string>",
"status": "True"
}
],
"status": "STATUS_CREATING"
}
}
}Create a new CoreWeave inference model deployment.
curl --request POST \
--url https://api.coreweave.com/v1alpha1/inference/deployments \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"name": "<string>",
"gatewayIds": [
"<string>"
],
"runtime": {
"engine": "<string>",
"version": "<string>",
"engineConfig": {}
},
"resources": {
"instanceType": "<string>",
"gpuCount": 123
},
"model": {
"name": "<string>",
"bucket": "<string>",
"path": "<string>"
},
"autoscaling": {
"min": 123,
"max": 123,
"priority": 123,
"concurrency": 123,
"capacityClasses": [
"CAPACITY_CLASS_RESERVED"
]
},
"traffic": {
"weight": 123
},
"id": "<string>",
"disabled": true
}
'{
"deployment": {
"spec": {
"id": "<string>",
"name": "<string>",
"gatewayIds": [
"<string>"
],
"runtime": {
"engine": "<string>",
"version": "<string>",
"engineConfig": {}
},
"resources": {
"instanceType": "<string>",
"gpuCount": 123
},
"model": {
"name": "<string>",
"bucket": "<string>",
"path": "<string>"
},
"autoscaling": {
"min": 123,
"max": 123,
"priority": 123,
"concurrency": 123,
"capacityClasses": [
"CAPACITY_CLASS_RESERVED"
]
},
"traffic": {
"weight": 123
},
"organizationId": "<string>",
"disabled": true
},
"status": {
"createdAt": "2023-11-07T05:31:56Z",
"updatedAt": "2023-11-07T05:31:56Z",
"conditions": [
{
"type": "<string>",
"lastUpdateTime": "2023-11-07T05:31:56Z",
"reason": "<string>",
"message": "<string>",
"zone": "<string>",
"status": "True"
}
],
"status": "STATUS_CREATING"
}
}
}Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
https://api.coreweave.com.{API_ACCESS_TOKEN} with your CoreWeave API access token.name, gatewayIds, runtime, resources,
model, autoscaling, and traffic. The available
runtime.engine and runtime.version values are returned
by GET /v1alpha1/inference/deployments/parameters.
curl -X POST https://api.coreweave.com/v1alpha1/inference/deployments \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {API_ACCESS_TOKEN}" \
-d @data.json
CoreWeave API access token sent as a bearer token.
Request for CreateDeployment
The name of the deployment
The gateways to associate the deployment with
Runtime selection and configuration
Show child attributes
Resource configuration for the deployment
Show child attributes
The model configuration
Show child attributes
The autoscaling configuration
Show child attributes
The traffic configuration for the deployment
Show child attributes
The unique identifier of the deployment, UUID format
Disable the deployment
OK
Response for CreateDeployment
The deployment that was created
Show child attributes
Was this page helpful?