Skip to main content
Create and manage CoreWeave Managed Inference deployments. See the getting started walkthrough for the gateway-to-deployment flow.

Example usage

# Look up available parameters first (optional but recommended).
data "coreweave_inference_deployment_parameters" "deploy_params" {}

resource "coreweave_inference_deployment" "example" {
  name        = "my-llm"
  gateway_ids = [tolist(data.coreweave_inference_deployment_parameters.deploy_params.gateway_ids)[0]]

  runtime = {
    engine  = "vllm"
    version = "0.8.5"
    engine_config = {
      "max-model-len" = "8192"
    }
  }

  resources = {
    instance_type = "H100_80GB_SXM5"
    gpu_count     = 1
  }

  model = {
    name   = "meta-llama/Llama-3.1-8B"
    bucket = "my-model-bucket"
    path   = "models/llama-3.1-8b"
  }

  autoscaling = {
    min              = 1
    max              = 4
    priority         = 100
    capacity_classes = ["CAPACITY_CLASS_RESERVED", "CAPACITY_CLASS_ON_DEMAND"]
    concurrency      = 16
  }

  traffic = {
    weight = 100
  }
}

Schema

Required

Optional

  • disabled (Boolean) Whether the deployment is disabled.
  • traffic (Attributes) Traffic configuration. Omit to accept the API default (weight 0, which normalizes to 100% when no other deployment shares the model name). After apply, weight is populated from the API. (see below for nested schema)

Read-Only

  • conditions (Attributes List) Detailed status conditions for the deployment. (see below for nested schema)
  • created_at (String) RFC3339 timestamp of when the deployment was created.
  • id (String) The unique identifier of the deployment.
  • organization_id (String) The organization ID that owns the deployment.
  • status (String) The current status of the deployment. See the Inference API overview for status values.
  • updated_at (String) RFC3339 timestamp of when the deployment was last updated.

Nested Schema for autoscaling

Required:
  • max (Number) Maximum number of instances. Must be ≥1.
  • min (Number) Minimum number of instances. Must be ≥1.
Optional:
  • capacity_classes (List of String) Ordered preference list of capacity classes to use. Order is significant: the first satisfiable class wins. Allowed values: CAPACITY_CLASS_RESERVED, CAPACITY_CLASS_ON_DEMAND.
  • concurrency (Number) Concurrency per instance target (≥1). Controls latency vs throughput tradeoffs.
  • priority (Number) Priority for cross-deployment scaling (0-1000). Higher values win when there is contention.

Nested Schema for model

Required:
  • bucket (String) The CAIOS bucket the model is stored in. The inference service account must have bucket access.
  • name (String) The model name used in API requests (e.g. the /models endpoint). Length must be 4-63 characters.
  • path (String) The CAIOS path to the model and its configuration files.

Nested Schema for resources

Required:
  • gpu_count (Number) Number of GPUs per instance. Must be one of: 1, 2, 4, 8, 16.
  • instance_type (String) The instance type to use.

Nested Schema for runtime

Required:
  • engine (String) The inference engine to use.
Optional:
  • engine_config (Map of String) Engine-specific configuration key/value pairs.
  • version (String) The version of the engine. If not set, defaults to the latest available version. Must follow semver format (e.g. 1.2.3).

Nested Schema for traffic

Optional:
  • weight (Number) Traffic weight (0-1000). Values are normalized into percentages across deployments with the same model name.

Nested Schema for conditions

Read-Only:
  • last_update_time (String) RFC3339 timestamp of the last condition transition.
  • message (String) A human-readable message about the condition’s last transition.
  • reason (String) A short, machine-readable reason for the condition’s last transition.
  • status (String) The condition status (True, False, or Unknown).
  • type (String) The condition type (e.g. Ready, Progressing).

Import

Import is supported using the following syntax:
terraform import coreweave_inference_deployment.example {{deployment-id}}
Last modified on June 23, 2026