Node Pools - CoreWeave Docs

This page explains how CKS uses Node Pools to manage groups of Nodes as a single entity. Read this to understand the relationship between instances, Nodes, and Node Pools before you create or scale a Node Pool, and to learn how CKS handles cordoning, scale-down, image pulls, and reboots. Instances are specific hardware configurations defined by CoreWeave, and each GPU and CPU instance type has its own hourly price. A Node Pool in CKS represents one or more instances that share a common configuration, such as the same labels, taints, and annotations. To configure a Node Pool, select the instance type to use and how many of that instance type you want to run for your cluster. Deploy a Node Pool either with a Kubernetes manifest or from the Cloud Console. Once a Node Pool is deployed to a cluster, CKS continuously monitors it to ensure that the number of running Nodes matches the number specified in the Node Pool’s manifest. You can deploy multiple Node Pools within a single cluster, where each Node Pool contains any number of Nodes. This lets you run different types of workloads on different types of Nodes, or scale different parts of your application independently.

View pricing for instance types on CoreWeave’s pricing page.

Availability

Node Pools are available in all General Access Regions.

Sizing your Node Pools

Because you can run multiple Node Pools in a single cluster, you decide how to divide your Nodes among them. There is no single correct layout. The right structure depends on how many distinct workloads you run, how you want to scale them, and how much management overhead you can absorb. Use the trade-offs and recommendations in this section to plan a layout, then create and autoscale your Node Pools to match. A Node Pool maps to Nodes through its target. For most instance types, spec.targetNodes sets the number of Nodes CKS provisions and keeps running. For rack-based instance types such as GB200 and GB300, spec.targetRacks sets the number of racks, and each rack contains 18 Nodes. Every Node in a Node Pool shares that pool’s instance type, labels, taints, and annotations, so a Node Pool is the unit where you set Node configuration and capacity together.

Recommendations by workload type

Use these recommendations as a starting point, then adjust based on how your workloads schedule and scale.

A single large training job: Use one Node Pool sized to the job. Training jobs run across many Nodes of the same instance type, so a single pool matches the workload and keeps management simple. Set the scaling strategy to IdleOnly so the autoscaler doesn’t remove Nodes that are still running the job, and consider Node Pool prefill to avoid a capacity gap if a Node is replaced mid-run.
Multiple independent inference endpoints: Use a separate Node Pool per endpoint, or per group of endpoints with the same instance type and scaling needs. Separate pools let you scale each endpoint independently in response to its own demand and isolate one endpoint’s Nodes from another’s. The PreferIdle strategy scales these pools down faster when demand drops. See Autoscale Node Pools for how the autoscaler selects a pool based on a Pod’s nodeSelector and affinity.
Mixed instance types: Use one Node Pool per instance type. Because every Node in a pool shares one instance type, you need a distinct pool for each GPU or CPU type your workloads require.
Bursty or intermittent workloads: Put these workloads in their own Node Pool and configure scale-to-zero so the pool drops to zero Nodes when there is no demand. Keep at least one other Node Pool that doesn’t scale to zero to run the required cluster components. If you scale more than one Node Pool to zero, review Scale-to-zero with multiple Node Pools, because the autoscaler may try incompatible pools before it finds a match.

Node Pool sizing considerations

The following table summarizes the trade-offs between consolidating Nodes into fewer, larger Node Pools and splitting them across more, smaller Node Pools.

Consideration	Fewer, larger pools	More, smaller pools
Management	Simpler. Fewer manifests to maintain and fewer pools to monitor and scale.	More overhead. Each pool is a separate manifest with its own labels, taints, and scaling settings to maintain and monitor.
Workload matching	Harder to match heterogeneous workloads, because every Node in a pool shares the same instance type and configuration.	Easier. Each pool can target a different instance type and configuration for a specific workload.
Scaling control	Coarser. Autoscaling is available at any pool size, but a single pool scales as a unit between `minNodes` and `maxNodes`.	Finer. You can set independent `minNodes`, `maxNodes`, and scaling strategy per workload.
Isolation	Lower. Workloads share Nodes and the pool’s configuration, so a configuration change or a disruptive Node affects more workloads.	Higher. A change or a disruptive Node is contained within one pool, isolating its effect on other workloads.

In short, fewer large Node Pools reduce management overhead at the cost of increased impact scope and less flexibility, while more smaller Node Pools give you workload isolation and finer scaling control at the cost of more pools to manage.

Node cordoning

CKS sometimes cordons Nodes to ensure that workloads are only scheduled to healthy Nodes. In most cases, CKS eventually removes the cordon, which makes the Node schedulable again. CoreWeave manages this kind of Node cordoning entirely, though you can also manually cordon Nodes for your own reasons.

You can add your own Node conditions to the Nodes, but don’t use CoreWeave Node conditions for automation. CoreWeave Node conditions are intended for internal use only, not for clients to use for their own custom management automation. CoreWeave can cordon Nodes for maintenance purposes or to resolve temporary issues.

A Node may be cordoned for several reasons, such as:

Maintenance: If a Node requires maintenance, updates, or hardware fixes, CKS cordons it to ensure no new workloads are placed on it during that time. This lets the Node lifecycle controller make necessary changes without disrupting running tasks.
Node draining for removal: If a Node must be removed from the cluster, CKS cordons the Node before draining it. CKS automatically reschedules workloads onto healthy Nodes, and no new workloads are scheduled to the cordoned Node.
InfiniBand or Ethernet link flaps: Link flaps are intermittent, unpredictable up-down transitions in a network connection, which can result in networking or communication failures. If an InfiniBand or Ethernet link is flapping, a Node can experience inconsistent or unreliable connectivity. In this case, CKS cordons the Node to ensure no workloads are scheduled to a Node with an unreliable network connection.
Temporary health check failures: Kubernetes uses health checks to assess the state of a user’s system. A temporary check failure might indicate transient issues that could degrade Node performance. CKS cordons the Node until the issue is resolved.

Don’t assume that cordoned Nodes have serious or permanent issues. If a cordoned Node has a fault that can’t be resolved quickly, CKS moves it out of production and into triage.

Cordoning Nodes in these cases lets CKS prevent disruptions. If you have questions about Node cordoning, or want to manually cordon Nodes for another reason, contact Support.

Node Pool prefill

Node Pool Prefill keeps a Node Pool at its target capacity while individual Nodes are replaced. When a Node is marked for triage (for example, for maintenance or a hardware issue, such as when the PendingPhaseState: triage condition is set), CKS provisions a replacement Node before the existing Node is drained and removed. The Node Pool stays at or above the number of Nodes specified in spec.targetNodes, so workloads do not see a capacity gap during the replacement window. Prefill is opt in and off by default. Enable it for each Node Pool by adding a prefill block to the Node Pool spec. See Enable Node Pool prefill.

Prefill billingWhen you enable prefill, there are no billing implications.

When to use prefill

Use prefill for Node Pools whose workloads are sensitive to capacity loss during Node replacement, such as long-running training jobs or latency-sensitive inference.

Without prefill enabled: CKS replaces a Node only after the old Node is removed, so the Node Pool is below its target Node count until the replacement is delivered and ready. This leaves a capacity gap that can last 20 to 40 minutes.
With prefill enabled: CKS provisions a replacement Node before the old Node is drained and removed, so the Node Pool stays at or above its target Node count throughout the replacement. Workloads keep running on the existing Node until the replacement is ready, which avoids a capacity gap during replacement.

Prefill availability

Prefill provisions replacement Nodes from on-demand capacity, so a replacement Node is not guaranteed. If on-demand capacity is unavailable, prefill keeps retrying, and the Node stays in the prefill flow until a replacement can be provisioned. Account for this when you rely on prefill for time-sensitive replacements.

Prefill is available for all instance types except rack-scale instances, for example, gb200, gb300, which are managed at the rack level rather than at the Node level.

How prefill works

A Node is marked unschedulable as soon as it enters triage, so no new workloads are scheduled onto it. If a replacement Node can’t be provisioned, CKS keeps retrying until one is available or the Node is no longer marked for prefill. After the replacement is ready, CKS waits up to the idle timeout (spec.prefill.timeout) for the old Node to become idle, then removes it. If the Node hasn’t become idle when the timeout expires, CKS drains and replaces the Node.

Prefill Node conditions

When prefill is enabled, CKS sets a Prefill condition on Nodes that are in the prefill flow. The condition’s reason field indicates the current state. You can use LastTransitionTime on the condition to see when the reason last changed.

Reason	Meaning
`AwaitingReplacement`	The Node is marked for prefill and a replacement Node is being provisioned.
`AwaitingIdleTimeout`	The replacement Node is in the cluster. CKS waits up to the idle timeout (`spec.prefill.timeout`) for this Node to become idle, then removes it.
`TimeoutExceeded`	The Node did not become idle within the idle timeout (`spec.prefill.timeout`) period, so CKS drains and removes the Node. Monitor Nodes nearing their `timeout` period, so CKS doesn’t drain critical workloads.

Prefill flow

The following diagram shows how a Node moves through the prefill flow, including the two paths CKS can take depending on whether the Node becomes idle within the default 24 hour idle timeout (spec.prefill.timeout) period. For details on enabling prefill, see Enable Node Pool prefill. For the prefill reference specification, see Prefill reference.

Scaling strategies

The CKS cluster autoscaler has two possible scale-down strategies. Set spec.lifecycle.scaleDownStrategy to one of the following values:

Strategy	Description
`IdleOnly`	Default strategy. CKS only selects idle Nodes for removal. This strategy ensures that CKS only removes Nodes that aren’t actively running workloads.
`PreferIdle`	CKS prioritizes idle Nodes for removal, but if there aren’t enough idle Nodes to meet the desired scale, CKS can also select non-idle Nodes. This allows faster scale-down when necessary.

Idle Nodes

A Node is idle when it has no Pods in the Running or Pending state. When determining a Node’s idle status, CKS ignores Pods with any of the following conditions:

Condition	Description
`qos.coreweave.com/graceful-interruptible` is set to `true` and the Pod’s `DeletionGracePeriodSeconds` has elapsed	CKS ignores graceful interruptible Pods only if they’re being deleted and the deletion grace period has elapsed.
The Pod is part of a `DaemonSet`	CKS ignores Pods managed by a `DaemonSet` when determining Node idleness.
`qos.coreweave.cloud/interruptable` label set to `true`	CKS ignores interruptible Pods when determining Node idleness.
`ns.coreweave.cloud/org` label set to `control-plane`	CKS ignores Control Plane Pods when determining Node idleness.

You can determine a Node’s idle status by inspecting the CWActive condition. CWActive = False means the Node is idle.

Node idle status and maintenance

CKS defers certain maintenance operations, such as moving a Node out of production after detecting elevated InfiniBand link flaps, until the Node becomes idle. A long-running Pod without an interruptible label keeps the Node active indefinitely, preventing those deferred operations from completing. For Pods that run continuously but can tolerate interruption, apply the qos.coreweave.cloud/interruptable or qos.coreweave.com/graceful-interruptible label so that CKS excludes the Pod from the Node’s idle calculation. See Pod interruption and eviction policies. Once CKS selects a Node for removal, CKS removes the Node by first cordoning it to prevent any new Pods from being scheduled onto it. Then, CKS drains the Node to perform a graceful cleanup. Once the Node is fully drained, CKS removes it from the Node Pool.

DeletionGracePeriodSeconds defaults to 30 seconds unless the Pod’s spec.terminationGracePeriodSeconds is set to a different value.

During the drain process, to prevent being stopped by Pods stuck in a Terminating state, CKS waits up to 5 minutes before more forcibly removing a draining Node.

Scale down a Node Pool

To scale down a Node Pool:

Set spec.targetNodes to the desired number of Nodes, or spec.targetRacks, depending on the target type set on the Node Pool. For instructions on doing this with Cloud Console or the Kubernetes CLI, see Manage Node Pools.
Set spec.lifecycle.scaleDownStrategy to the preferred scaling strategy.

Node Pool types

CKS uses the default Node Pool type to manage Reserved and On-Demand instances. The following section describes when to use the default type and how to configure it.

Default Node Pools

CKS uses the default Node Pool type (spec.computeClass: default) for Reserved and On-Demand instances. Billing for default Node Pools depends on Reservation and utilization.

You don’t need to specify default Node Pools.If you don’t specify a computeClass, the Node Pool defaults to the default type.

Example default Node Pool manifest

apiVersion: compute.coreweave.com/v1alpha1
kind: NodePool
metadata:
  name: default
spec:
  computeClass: default
  # ...

For more information on Node Pool manifests, see Node Pool reference, or learn how to Create a Node Pool.

Manage Node Pools with Terraform

The CoreWeave Terraform provider manages clusters, VPCs, and Object Storage, but it doesn’t include a Node Pool resource. Deploy Node Pools through the Cloud Console or the Kubernetes API instead. A Terraform-only workflow typically combines the CoreWeave provider for the cluster and VPC with a separate mechanism, such as the Kubernetes provider, to apply Node Pool manifests. See How do I use Terraform to manage clusters and Node Pools?.

Image pull best practices

CoreWeave operates a region-level registry proxy that accelerates container image pulls and reduces exposure to public registry rate limits for your cluster’s Nodes. To ensure predictable and fast rollouts when scaling Node Pools, use immutable tags, or pin by digest, for production workloads. This ensures every Node pulls the same artifact and avoids stale results from proxy metadata caching. Avoid mutable tags like :latest. With metadata caching enabled, the proxy can continue serving a cached manifest until the cache expires, which can lead to inconsistent versions across Nodes. For more information, see the region-level image proxy documentation.

Reboot methods

You can manually reboot Nodes in the following two ways:

Reboot method	When to use	Typical duration
Reboot only	Maintenance, troubleshooting, or applying changes that don’t require reconfiguration	Approximately 10 minutes or more
Reconfigure reboot	Applying OS images, GPU driver updates, or other Node Pool modifications that require reconfiguration	Approximately 1 hour or more

Node conditions for reboots (deprecated)

CoreWeave’s Node conditions are visible when you manage reboots. You previously set these Node conditions with the Conditioner Kubectl plugin, but you can now use the CoreWeave Intelligent CLI to reboot Nodes without Node conditions. See the Reboot Nodes and Apply Node Pool updates guides for more information.

Condition	Description	Deprecated equivalent
`AdminSafePowerCycle`	Marks the Node to reboot when it is idle, only after all running jobs are complete.	`AdminSafeReboot`
`AdminImmediatePowerCycle`	Marks the Node to reboot when it is idle, without waiting for running jobs to complete.	`AdminImmediateReboot`

Don’t use Node conditions for automationYou can add your own Node conditions to the Nodes, but don’t use CoreWeave Node conditions for automation. CoreWeave Node conditions are intended for internal use only, not for clients to use for their own custom management automation. CoreWeave can cordon Nodes for maintenance purposes or to resolve temporary issues.

Control Plane Node Pool (deprecated)

CoreWeave provisioned clusters created before July 7, 2025 with a cpu-control-plane Node Pool for CKS-managed components. Clusters created after this date don’t have this Node Pool. The CKS Control Plane now manages the components out-of-band. See the Control Plane Node Pool release notes for more information.

Do not install the NVIDIA GPU Operator on CKS clusters

CoreWeave manages the NVIDIA GPU Operator on your behalf. Do not install the NVIDIA GPU Operator on CKS clusters. Doing so conflicts with the platform-managed deployment and is not supported.

​Availability

​Sizing your Node Pools

​Recommendations by workload type

​Node Pool sizing considerations

​Node cordoning

​Node Pool prefill

​When to use prefill

​Prefill availability

​How prefill works

​Prefill Node conditions

​Prefill flow

​Scaling strategies

​Idle Nodes

​Node idle status and maintenance

​Scale down a Node Pool

​Node Pool types

​Default Node Pools

​Manage Node Pools with Terraform

​Image pull best practices

​Reboot methods

​Node conditions for reboots (deprecated)

​Control Plane Node Pool (deprecated)

​Do not install the NVIDIA GPU Operator on CKS clusters

Availability

Sizing your Node Pools

Recommendations by workload type

Node Pool sizing considerations

Node cordoning

Node Pool prefill

When to use prefill

Prefill availability

How prefill works

Prefill Node conditions

Prefill flow

Scaling strategies

Idle Nodes

Node idle status and maintenance

Scale down a Node Pool

Node Pool types

Default Node Pools

Manage Node Pools with Terraform

Image pull best practices

Reboot methods

Node conditions for reboots (deprecated)

Control Plane Node Pool (deprecated)

Do not install the NVIDIA GPU Operator on CKS clusters