Skip to main content
CKS supports scaling Node Pools by using the Kubernetes Cluster Autoscaler, letting you scale CKS Node Pools in response to workload demands for GPU, CPU, or memory resources. This page explains how to configure Node Pool autoscaling, describes the underlying scale-up and scale-down behavior, and covers monitoring and troubleshooting so you can match capacity to your workload demand. Cluster Autoscaler is enabled by default in all CKS clusters running Kubernetes 1.32 or later. To upgrade your clusters to the latest version, see Upgrade Kubernetes.
Node Pool autoscaling has the following limitations:
  • No SUNK integration: SUNK does not support autoscaling Node Pools.
  • No support for rack-based instance types: Autoscaling is not supported for rack-based instance types (GB200, GB300). Setting autoscaling: true on a Node Pool with a rack-based instance type is rejected by CKS.

Autoscaling behavior

The following table summarizes what a scale-up and a scale-down behavior and its billing implications:
BehaviorScale upScale down
Time5 to 15 minutes depending on the instance type.

Scale-up time is generally independent of the number of Nodes added in a single operation. Time can vary, however, based on resource availability and the size of the increase.
Scale-down completes once you remove all jobs from the Node.
Job schedulingPods schedule onto the new Nodes once they join the cluster.
BillingYou are not charged for Nodes while they scale up.You are not charged for Nodes after you remove all jobs and CoreWeave reclaims the Node.

Configure autoscaling

The autoscaler adjusts the Node Pool’s targetNodes value within the min and max range that you define in the Node Pool manifest. To enable autoscaling, set the following values:
  • autoscaling: Set autoscaling to true.
  • maxNodes: Set the number of maximum Nodes you want to scale up to.
  • minNodes: Set the number of minimum Nodes you want to scale down to.
The following example Node Pool manifest sets these values:
  • autoscaling: Set autoscaling to true.
  • maxNodes: Set to 4.
  • minNodes: Set to 2.
apiVersion: compute.coreweave.com/v1alpha1
kind: NodePool
metadata:
  name: example-nodepool
spec:
  computeClass: default
  autoscaling: true # Set autoscaling to true
  lifecycle:
    scaleDownStrategy: PreferIdle # Scale down a cluster as quickly as possible, see "Autoscaling behavior"
  instanceType: gd-8xh100ib-i128  # Select your desired instance type
  maxNodes: 4 # Set desired maximum nodes.
  minNodes: 2 # Set desired minimum nodes
  targetNodes: 2
  nodeLabels:
    my-label/node: "true"
  nodeAnnotations:
    my-annotation/node: "true"
  nodeTaints:
    - key: node-taint
      value: "true"
      effect: NoSchedule

Autoscaling behavior

Autoscaling increases or decreases the number of Nodes in a Node Pool when the following occurs:
  • Scale up: When CKS cannot schedule Pods due to insufficient resources, like CPU or memory, CKS scales up the Node Pools. For more information, see How does up-scale work? in the Kubernetes documentation.
  • Scale down: When CKS determines that Nodes are underutilized for a configured period, CKS scales down the Node Pools. For more information, see How does down-scale work? in the Kubernetes documentation.
The selected Node Pool scaling strategy affects how quickly the cluster autoscaler can scale down your Nodes. If you want to scale down a cluster as quickly as possible, or if binpacking is desired, use the aggressive PreferIdle strategy. If you have training jobs or other workloads that cannot be disrupted, use the cautious IdleOnly strategy. See Node Pool scaling strategies for more information.

Node selectors and autoscaling

Cluster Autoscaler can scale the appropriate Node Pool when a Pod cannot be scheduled due to resource limits. CKS decides which Node Pool to scale based on the placement requirements defined in the Pod specification, for example, in the nodeSelector or affinity fields. These fields help the autoscaler choose a Node Pool that matches the Pod’s requirements. If you don’t specify Pod placement requirements, the autoscaler may scale any available Node Pool.

Autoscaling considerations

For autoscaling to work, the following criteria must be met:
  • Available quota: You must have the available quota amount that meets or exceeds the number specified in the maxNode field. For example, if you have maxNode set to 10, you must have that quota available in your organization. To check your organization’s quota, see the quota reference documentation.
  • Available capacity: The region where your cluster exists must have the capacity to provision the Nodes. For example, if the region your cluster is in doesn’t have the capacity to provision Nodes, CKS can’t scale your Node Pools. To determine your organization’s capacity, see the capacity reference documentation.

Scale-to-zero

The scale-to-zero feature is useful when you want to minimize resource costs by letting a Node Pool drop to zero Nodes when there’s no demand. For CKS to scale a Node Pool to zero, you need at least one other Node Pool in the cluster that you don’t scale to zero. This other Node Pool runs the Konnectivity Agent for network connectivity. To ensure a Node Pool can scale to zero, do the following:
  • On the Node Pool you want to scale to zero, set minNodes to 0 and maxNodes to a value greater than 0. This setting allows the Node Pool to scale down to zero Nodes when there’s no demand.
  • Create another Node Pool (for example, with a less expensive instance type) and set its manifest so that the required Konnectivity Agents run on it. See the following sample Node Pool manifest that schedules the Konnectivity Agents to run on it:
    apiVersion: compute.coreweave.com/v1alpha1
    kind: NodePool
    metadata:
      name: konnectivity-agents
    spec:
    # NOTE: OTHER FIELDS NOT SHOWN
      nodeLabels:
        cks.coreweave.cloud/system-critical: "true" # Be sure to set nodeLabels on the Konnectivity Node to this value.
    
  • Otherwise, on your main Node Pool, set minNodes to at least 2 to ensure that Konnectivity has the required number of Nodes to run and thus won’t impact scaling decisions by the autoscaler.

Scale-to-zero with multiple Node Pools

When you enable scale-to-zero on multiple autoscaled Node Pools with different Node configurations, the autoscaler doesn’t know in advance which Node Pool to scale up for a pending Pod. Because no Nodes are running, it can’t match the Pod’s requirements to an existing Node type. As a result, if more than one Node Pool is scaled to zero, the autoscaler might first scale up an incompatible Node Pool, fail to schedule the Pod, and then try other Node Pools until it finds a match. These repeated scale-up attempts can delay the Pod’s startup.

Cluster autoscaler monitoring

To monitor autoscaling without querying metrics or logs directly, use the NodePools and Cluster Autoscaler dashboard in CoreWeave Grafana. The dashboard provides prebuilt panels for Node Pool capacity and conditions, Node Pool events, and Cluster Autoscaler scaling activity, errors, and logs. Filter the panels by region, zone, organization, cluster, and Node Pool to focus on the resources you care about. If you want to query the underlying logs and metrics yourself, use the Explore view in CoreWeave Grafana as described in the following sections.

Query autoscaler logs and metrics directly

To view logs in CoreWeave Grafana, navigate to Explore and use CoreWeave Logs for logs: Left sidebar showing the Explore menu Left sidebar showing the Loki menu You can search for the string app="cluster-autoscaler": Grafana Explorer search for autoscaling logs To view metrics in CoreWeave Grafana, navigate to Explore and use CoreWeave Metrics. All the metrics are prefixed with cluster_autoscaler_. For more information, see the Kubernetes Cluster Autoscaler Monitoring documentation. To find autoscaling metrics, navigate to CoreWeave Metrics and search cluster_autoscaler: Grafana Explorer for searching autoscaling metrics

Test autoscaling

You can test your autoscaling configuration using the following workload. The workload requires all eight GPUs on four Nodes, so if you run it on a Node Pool with fewer than four Nodes available, Cluster Autoscaler adds the correct number of instances to accommodate the workload. The workload uses the nodeSelector field to specify the required instance to schedule. When a cluster has multiple Node Pools, the nodeSelector field lets the cluster know which Node Pool to scale.
apiVersion: batch/v1
kind: Job
metadata:
  name: nvidia-l40-gpu-job
spec:
  parallelism: 4
  completions: 4
  template:
    metadata:
      labels:
        app: nvidia-l40-gpu
        gpu.nvidia.com/class: L40
        gpu.nvidia.com/model: L40
        gpu.nvidia.com/vram: "46"
    spec:
      restartPolicy: Never
      containers:
        - name: gpu-app
          image: nvidia/cuda:12.3.0-devel-ubuntu22.04
          command: ["/bin/bash", "-c"]
          args:
            - |
              apt-get update && apt-get install -y build-essential cmake make git && \
              git clone https://github.com/NVIDIA/cuda-samples.git && \
              cd cuda-samples && mkdir build && cd build && \
              cmake .. -DCMAKE_CUDA_ARCHITECTURES=89 && \
              echo "start here" make && ls -la  && \
              cd Samples/1_Utilities/deviceQuery && \
              make && ./deviceQuery && sleep 6000
          resources:
            limits:
              nvidia.com/gpu: 8
      nodeSelector:
        gpu.nvidia.com/class: L40
        gpu.nvidia.com/model: L40
      tolerations:
        - key: "nvidia.com/gpu"
          operator: "Exists"
          effect: "NoSchedule"

Troubleshoot autoscaling behavior

ProblemPotential issueSuggested fix
Nodes don’t scale up.Node Pool created with minNodes:0. When you create a Node Pool with minNodes: 0, the pool initially has no Nodes. The autoscaler requires at least one Node (targetNodes: 1) so it can cache the “shape” (resource characteristics) of the Node. This cache is necessary for the autoscaler to determine if it can schedule Pods onto the Node Pool in the future.

CKS removes a Node, causing the autoscaler to attempt to schedule a new Node with the wrong “shape”. Occasionally, the autoscaler may cache a Node that CoreWeave automation has “tainted” (marked unschedulable). The cached taint can cause the autoscaler to incorrectly determine that it can’t scale up the pool, even if new scheduling requests exist.
Manually set targetNodes to 1. This triggers CKS to add a Node, updating the cache (or clearing a bad cache entry) and causing it to schedule a new Node.
Nodes don’t scale down.Konnectivity Agent replica scheduling. CKS expects two replicas of the Konnectivity Agent to run for network connectivity. These agent Pods can block the Node Pool from scaling down to zero, or conversely, can trigger unexpected scaling up if the autoscaled pool has resource needs.Follow the instructions in the Scale-to-zero section for creating a Node Pool for the Konnectivity replicas to run on.
Last modified on June 17, 2026