Example usage
Schema
Required
autoscaling(Attributes) Autoscaling configuration. (see below for nested schema)gateway_ids(Set of String) The gateway IDs to associate the deployment with. At least one is required.model(Attributes) Model configuration. (see below for nested schema)name(String) The name of the deployment. Must be a valid hostname label.resources(Attributes) GPU resource configuration for the deployment. (see below for nested schema)runtime(Attributes) Runtime selection and configuration. (see below for nested schema)
Optional
disabled(Boolean) Whether the deployment is disabled.traffic(Attributes) Traffic configuration. Omit to accept the API default (weight 0, which normalizes to 100% when no other deployment shares the model name). After apply,weightis populated from the API. (see below for nested schema)
Read-Only
conditions(Attributes List) Detailed status conditions for the deployment. (see below for nested schema)created_at(String) RFC3339 timestamp of when the deployment was created.id(String) The unique identifier of the deployment.organization_id(String) The organization ID that owns the deployment.status(String) The current status of the deployment. See the Inference API overview for status values.updated_at(String) RFC3339 timestamp of when the deployment was last updated.
Nested Schema for autoscaling
Required:
max(Number) Maximum number of instances. Must be ≥1.min(Number) Minimum number of instances. Must be ≥1.
capacity_classes(List of String) Ordered preference list of capacity classes to use. Order is significant: the first satisfiable class wins. Allowed values:CAPACITY_CLASS_RESERVED,CAPACITY_CLASS_ON_DEMAND.concurrency(Number) Concurrency per instance target (≥1). Controls latency vs throughput tradeoffs.priority(Number) Priority for cross-deployment scaling (0-1000). Higher values win when there is contention.
Nested Schema for model
Required:
bucket(String) The CAIOS bucket the model is stored in. The inference service account must have bucket access.name(String) The model name used in API requests (e.g. the/modelsendpoint). Length must be 4-63 characters.path(String) The CAIOS path to the model and its configuration files.
Nested Schema for resources
Required:
gpu_count(Number) Number of GPUs per instance. Must be one of: 1, 2, 4, 8, 16.instance_type(String) The instance type to use.
Nested Schema for runtime
Required:
engine(String) The inference engine to use.
engine_config(Map of String) Engine-specific configuration key/value pairs.version(String) The version of the engine. If not set, defaults to the latest available version. Must follow semver format (e.g.1.2.3).
Nested Schema for traffic
Optional:
weight(Number) Traffic weight (0-1000). Values are normalized into percentages across deployments with the same model name.
Nested Schema for conditions
Read-Only:
last_update_time(String) RFC3339 timestamp of the last condition transition.message(String) A human-readable message about the condition’s last transition.reason(String) A short, machine-readable reason for the condition’s last transition.status(String) The condition status (True,False, orUnknown).type(String) The condition type (e.g.Ready,Progressing).