Skip to main content
This page explains how CKS uses Node Pools to manage groups of Nodes as a single entity. Read this to understand the relationship between instances, Nodes, and Node Pools before you create or scale a Node Pool, and to learn how CKS handles cordoning, scale-down, image pulls, and reboots. Instances are specific hardware configurations defined by CoreWeave, and each GPU and CPU instance type has its own hourly price. A Node Pool in CKS represents one or more instances that share a common configuration, such as the same labels, taints, and annotations. To configure a Node Pool, select the instance type to use and how many of that instance type you want to run for your cluster. Deploy a Node Pool either with a Kubernetes manifest or from the Cloud Console. Once a Node Pool is deployed to a cluster, CKS continuously monitors it to ensure that the number of running Nodes matches the number specified in the Node Pool’s manifest. You can deploy multiple Node Pools within a single cluster, where each Node Pool contains any number of Nodes. This lets you run different types of workloads on different types of Nodes, or scale different parts of your application independently.
View pricing for instance types on CoreWeave’s pricing page.

Availability

Node Pools are available in all General Access Regions.

Node cordoning

CKS sometimes cordons Nodes to ensure that workloads are only scheduled to healthy Nodes. In most cases, CKS eventually removes the cordon, which makes the Node schedulable again. CoreWeave manages this kind of Node cordoning entirely, though you can also manually cordon Nodes for your own reasons.
You can add your own Node conditions to the Nodes, but don’t use CoreWeave Node conditions for automation. CoreWeave Node conditions are intended for internal use only, not for clients to use for their own custom management automation. CoreWeave can cordon Nodes for maintenance purposes or to resolve temporary issues.
A Node may be cordoned for several reasons, such as:
  • Maintenance: If a Node requires maintenance, updates, or hardware fixes, CKS cordons it to ensure no new workloads are placed on it during that time. This lets the Node Life Cycle controller make necessary changes without disrupting running tasks.
  • Node draining for removal: If a Node must be removed from the cluster, CKS cordons the Node before draining it. CKS automatically reschedules workloads onto healthy Nodes, and no new workloads are scheduled to the cordoned Node.
  • InfiniBand or Ethernet link flaps: Link flaps are intermittent, unpredictable up-down transitions in a network connection, which can result in networking or communication failures. If an InfiniBand or Ethernet link is flapping, a Node can experience inconsistent or unreliable connectivity. In this case, CKS cordons the Node to ensure no workloads are scheduled to a Node with an unreliable network connection.
  • Temporary health check failures: Kubernetes uses health checks to assess the state of a user’s system. A temporary check failure might indicate transient issues that could degrade Node performance. CKS cordons the Node until the issue is resolved.
Users shouldn’t assume that cordoned Nodes have serious or permanent issues. If a cordoned Node has a fault that can’t be resolved quickly, CKS moves it out of production and into triage.
Cordoning Nodes in these cases lets CKS prevent disruptions. If you have questions about Node cordoning, or want to manually cordon Nodes for another reason, contact Support.

Scaling strategies

The CKS cluster autoscaler has two possible scale-down strategies. Set spec.lifecycle.scaleDownStrategy to one of the following values:
StrategyDescription
IdleOnlyDefault strategy. CKS only selects idle Nodes for removal. This strategy ensures that CKS only removes Nodes that aren’t actively running workloads.
PreferIdleCKS prioritizes idle Nodes for removal, but if there aren’t enough idle Nodes to meet the desired scale, CKS can also select non-idle Nodes. This allows faster scale-down when necessary.

Idle Nodes

A Node is idle when it has no Pods in the Running or Pending state. When determining a Node’s idle status, CKS ignores Pods with any of the following conditions:
ConditionDescription
qos.coreweave.com/graceful-interruptible is set to true and the Pod’s DeletionGracePeriodSeconds has elapsedCKS ignores graceful interruptible Pods only if they’re being deleted and the deletion grace period has elapsed.
The Pod is part of a DaemonSetCKS ignores Pods managed by a DaemonSet when determining Node idleness.
qos.coreweave.cloud/interruptable label set to trueCKS ignores interruptible Pods when determining Node idleness.
ns.coreweave.cloud/org label set to control-planeCKS ignores Control Plane Pods when determining Node idleness.
You can determine a Node’s idle status by inspecting the CWActive condition. CWActive = False means the Node is idle. Once CKS selects a Node for removal, the Node Pool Operator (NPO) removes the Node by first cordoning it to prevent any new Pods from being scheduled onto it. Then, NPO drains the Node to perform a graceful cleanup. Once the Node is fully drained, CKS removes it from the Node Pool.
DeletionGracePeriodSeconds defaults to 30 seconds unless the Pod’s spec.terminationGracePeriodSeconds is set to a different value.
During the drain process, to prevent being stopped by Pods stuck in a Terminating state, NPO waits up to 5 minutes before forcibly removing a draining Node.

Scale down a Node Pool

To scale down a Node Pool:
  1. Set spec.targetNodes to the desired number of Nodes. For instructions on doing this with Cloud Console or the Kubernetes CLI, see Manage Node Pools.
  2. Set spec.lifecycle.scaleDownStrategy to the preferred scaling strategy.

Node Pool types

CKS uses the default Node Pool type to manage Reserved and On-Demand instances. The following section describes when to use the default type and how to configure it.

Default Node Pools

CKS uses the default Node Pool type (spec.computeClass: default) for Reserved and On-Demand instances. Billing for default Node Pools depends on Reservation and utilization.
You don’t need to specify default Node Pools.If you don’t specify a computeClass, the Node Pool defaults to the default type.
Example default Node Pool manifest
apiVersion: compute.coreweave.com/v1alpha1
kind: NodePool
metadata:
  name: default
spec:
  computeClass: default
  # ...
For more information on Node Pool manifests, see Node Pool reference, or learn how to Create a Node Pool.

Image pull best practices

CoreWeave operates a region-level registry proxy that accelerates container image pulls and reduces exposure to public registry rate limits for your cluster’s Nodes. To ensure predictable and fast rollouts when scaling Node Pools, use immutable tags, or pin by digest, for production workloads. This ensures every Node pulls the same artifact and avoids stale results from proxy metadata caching. Avoid mutable tags like :latest. With metadata caching enabled, the proxy can continue serving a cached manifest until the cache expires, which can lead to inconsistent versions across Nodes. For more information, see the region-level image proxy documentation.

Reboot methods

You can manually reboot Nodes in the following two ways:
Reboot methodWhen to useTypical duration
Reboot onlyMaintenance, troubleshooting, or applying changes that do not require reconfigurationApproximately 10 minutes or more
Reconfigure rebootApplying OS images, GPU driver updates, or other Node Pool modifications that require reconfigurationApproximately 1 hour or more

Node conditions for reboots (deprecated)

CoreWeave’s Node conditions are visible when you manage reboots. You previously set these Node conditions with the Conditioner Kubectl plugin, but you can now use the CoreWeave Intelligent CLI to reboot Nodes without Node conditions. See the Reboot Nodes and Apply Node Pool updates guides for more information.
ConditionDescriptionDeprecated equivalent
AdminSafePowerCycleMarks the Node to reboot when it is idle, only after all running jobs are complete.AdminSafeReboot
AdminImmediatePowerCycleMarks the Node to reboot when it is idle, without waiting for running jobs to complete.AdminImmediateReboot
Do not use Node conditions for automationYou can add your own Node conditions to the Nodes, but don’t use CoreWeave Node conditions for automation. CoreWeave Node conditions are intended for internal use only, not for clients to use for their own custom management automation. CoreWeave can cordon Nodes for maintenance purposes or to resolve temporary issues.

Control Plane Node Pool (deprecated)

CoreWeave provisioned clusters created before July 7, 2025 with a cpu-control-plane Node Pool for CKS-managed components. Clusters created after this date don’t have this Node Pool. The CKS Control Plane now manages the components out-of-band. See the Control Plane Node Pool release notes for more information.

Do not install the NVIDIA GPU Operator on CKS clusters

CoreWeave manages the NVIDIA GPU Operator on your behalf. Do not install the NVIDIA GPU Operator on CKS clusters. Doing so conflicts with the platform-managed deployment and is not supported.
Last modified on June 10, 2026