Node Pool autoscaling has the following limitations:
- No SUNK integration: SUNK does not support autoscaling Node Pools.
- No support for rack-based instance types: Autoscaling is not supported for rack-based instance types (GB200, GB300). Setting
autoscaling: trueon a Node Pool with a rack-based instance type is rejected by CKS.
Autoscaling behavior
The following table summarizes what a scale-up and a scale-down behavior and its billing implications:| Behavior | Scale up | Scale down |
|---|---|---|
| Time | 5 to 15 minutes depending on the instance type. Scale-up time is generally independent of the number of Nodes added in a single operation. Time can vary, however, based on resource availability and the size of the increase. | Scale-down completes once you remove all jobs from the Node. |
| Job scheduling | Pods schedule onto the new Nodes once they join the cluster. | |
| Billing | You are not charged for Nodes while they scale up. | You are not charged for Nodes after you remove all jobs and CoreWeave reclaims the Node. |
Configure autoscaling
The autoscaler adjusts the Node Pool’stargetNodes value within the min and max range that you define in the Node Pool manifest.
To enable autoscaling, set the following values:
autoscaling: Set autoscaling totrue.maxNodes: Set the number of maximum Nodes you want to scale up to.minNodes: Set the number of minimum Nodes you want to scale down to.
autoscaling: Set autoscaling totrue.maxNodes: Set to4.minNodes: Set to2.
Autoscaling behavior
Autoscaling increases or decreases the number of Nodes in a Node Pool when the following occurs:- Scale up: When CKS cannot schedule Pods due to insufficient resources, like CPU or memory, CKS scales up the Node Pools. For more information, see How does up-scale work? in the Kubernetes documentation.
- Scale down: When CKS determines that Nodes are underutilized for a configured period, CKS scales down the Node Pools. For more information, see How does down-scale work? in the Kubernetes documentation.
PreferIdle strategy. If you have training jobs or other workloads that cannot be disrupted, use the cautious IdleOnly strategy. See Node Pool scaling strategies for more information.
Node selectors and autoscaling
Cluster Autoscaler can scale the appropriate Node Pool when a Pod cannot be scheduled due to resource limits. CKS decides which Node Pool to scale based on the placement requirements defined in the Pod specification, for example, in thenodeSelector or affinity fields. These fields help the autoscaler choose a Node Pool that matches the Pod’s requirements. If you don’t specify Pod placement requirements, the autoscaler may scale any available Node Pool.
Autoscaling considerations
For autoscaling to work, the following criteria must be met:-
Available quota: You must have the available quota amount that meets or exceeds the number specified in the
maxNodefield. For example, if you havemaxNodeset to 10, you must have that quota available in your organization. To check your organization’s quota, see thequotareference documentation. -
Available capacity: The region where your cluster exists must have the capacity to provision the Nodes. For example, if the region your cluster is in doesn’t have the capacity to provision Nodes, CKS can’t scale your Node Pools. To determine your organization’s capacity, see the
capacityreference documentation.
Scale-to-zero
The scale-to-zero feature is useful when you want to minimize resource costs by letting a Node Pool drop to zero Nodes when there’s no demand. For CKS to scale a Node Pool to zero, you need at least one other Node Pool in the cluster that you don’t scale to zero. This other Node Pool runs the Konnectivity Agent for network connectivity. To ensure a Node Pool can scale to zero, do the following:-
On the Node Pool you want to scale to zero, set
minNodesto0andmaxNodesto a value greater than0. This setting allows the Node Pool to scale down to zero Nodes when there’s no demand. -
Create another Node Pool (for example, with a less expensive instance type) and set its manifest so that the required Konnectivity Agents run on it. See the following sample Node Pool manifest that schedules the Konnectivity Agents to run on it:
-
Otherwise, on your main Node Pool, set
minNodesto at least2to ensure that Konnectivity has the required number of Nodes to run and thus won’t impact scaling decisions by the autoscaler.
Scale-to-zero with multiple Node Pools
When you enable scale-to-zero on multiple autoscaled Node Pools with different Node configurations, the autoscaler doesn’t know in advance which Node Pool to scale up for a pending Pod. Because no Nodes are running, it can’t match the Pod’s requirements to an existing Node type. As a result, if more than one Node Pool is scaled to zero, the autoscaler might first scale up an incompatible Node Pool, fail to schedule the Pod, and then try other Node Pools until it finds a match. These repeated scale-up attempts can delay the Pod’s startup.Cluster autoscaler monitoring
To monitor autoscaling without querying metrics or logs directly, use the NodePools and Cluster Autoscaler dashboard in CoreWeave Grafana. The dashboard provides prebuilt panels for Node Pool capacity and conditions, Node Pool events, and Cluster Autoscaler scaling activity, errors, and logs. Filter the panels by region, zone, organization, cluster, and Node Pool to focus on the resources you care about. If you want to query the underlying logs and metrics yourself, use the Explore view in CoreWeave Grafana as described in the following sections.Query autoscaler logs and metrics directly
To view logs in CoreWeave Grafana, navigate to Explore and use CoreWeave Logs for logs:

app="cluster-autoscaler":

cluster_autoscaler_. For more information, see the Kubernetes Cluster Autoscaler Monitoring documentation. To find autoscaling metrics, navigate to CoreWeave Metrics and search cluster_autoscaler:

Test autoscaling
You can test your autoscaling configuration using the following workload. The workload requires all eight GPUs on four Nodes, so if you run it on a Node Pool with fewer than four Nodes available, Cluster Autoscaler adds the correct number of instances to accommodate the workload. The workload uses thenodeSelector field to specify the required instance to schedule. When a cluster has multiple Node Pools, the nodeSelector field lets the cluster know which Node Pool to scale.
Troubleshoot autoscaling behavior
| Problem | Potential issue | Suggested fix |
|---|---|---|
| Nodes don’t scale up. | Node Pool created with minNodes:0. When you create a Node Pool with minNodes: 0, the pool initially has no Nodes. The autoscaler requires at least one Node (targetNodes: 1) so it can cache the “shape” (resource characteristics) of the Node. This cache is necessary for the autoscaler to determine if it can schedule Pods onto the Node Pool in the future.CKS removes a Node, causing the autoscaler to attempt to schedule a new Node with the wrong “shape”. Occasionally, the autoscaler may cache a Node that CoreWeave automation has “tainted” (marked unschedulable). The cached taint can cause the autoscaler to incorrectly determine that it can’t scale up the pool, even if new scheduling requests exist. | Manually set targetNodes to 1. This triggers CKS to add a Node, updating the cache (or clearing a bad cache entry) and causing it to schedule a new Node. |
| Nodes don’t scale down. | Konnectivity Agent replica scheduling. CKS expects two replicas of the Konnectivity Agent to run for network connectivity. These agent Pods can block the Node Pool from scaling down to zero, or conversely, can trigger unexpected scaling up if the autoscaled pool has resource needs. | Follow the instructions in the Scale-to-zero section for creating a Node Pool for the Konnectivity replicas to run on. |