Introduction to Node Pools

Manage groups of Nodes as a single entity with Node Pools

Instances are specific hardware configurations defined by CoreWeave, and each GPU and CPU instance type has its own hourly price.

A Node Pool in CKS represents one or more instances that share a common configuration, such as the same labels, taints, and annotations. Node Pools are configured by selecting the instance type to use and how many of that instance type that you would like to run for your cluster. They are deployed either by using a Kubernetes manifest or by using the Cloud Console.

Once a Node Pool is deployed to a cluster, CKS continuously monitors it to ensure that the number of running Nodes matches the number specified in the Node Pool's manifest.

Multiple Node Pools may be deployed within a single cluster, where each Node Pool contains any number of Nodes. This allows you to run different types of workloads on different types of Nodes, or to scale different parts of your application independently.

Learn more

View pricing for Instance types on the Instances Pricing page.

Node cordoning

In CKS, Nodes are sometimes cordoned in order to ensure that workloads are only scheduled to healthy Nodes. In most cases, cordoning is eventually removed, making the Node schedulable again. This kind of Node cordoning is managed entirely by CoreWeave, though customers may also manually cordon Nodes for their own reasons.

Note

You may add your own Node conditions to the Nodes, but do not leverage CoreWeave Node conditions for automation. CoreWeave Node conditions are intended for internal use only and are not intended for clients to use for their own custom management automation. Nodes may be cordoned by CoreWeave for maintenance purposes, or to resolve temporary issues.

A Node may be cordoned for several reasons, such as:

Maintenance: If a Node requires maintenance, updates, or hardware fixes, CKS cordons it to ensure no new workloads are placed on it during that time. This allows the Node Life Cycle controller to make necessary changes without disrupting running tasks.
Node draining for removal: If a Node needs to be removed from the cluster, the Node will be cordoned prior to draining it. Workloads are automatically rescheduled onto healthy Nodes, and no new workloads will be scheduled to the cordoned Node.
InfiniBand or Ethernet link flaps: Link flaps are intermittent, unpredictable up-down transitions in a network connection, which can result in networking or communication failures. If an InfiniBand or Ethernet link is flapping, a Node can experience inconsistent or unreliable connectivity. In this case, the Node is cordoned to ensure no workloads are scheduled to a Node with an unreliable network connection.
Temporary health check failures: Kubernetes uses various health checks to assess the state of a user's system. A temporary check fail might indicate transient issues that could degrade Node performance. The Node is cordoned until the issue is resolved.

Important

Users should not assume that cordoned Nodes have substantial or permanent issues. If a cordoned Node is deemed to have a fault that cannot be easily resolved, CKS will move it out of production and into triage.

Cordoning Nodes in these cases allows CKS to prevent disruptions. If you have questions about Node cordoning, or would like to manually cordon Nodes for another reason, please contact support.

Scaling strategies

The CKS cluster autoscaler has two possible scale-down strategies. Set spec.lifecycle.scaleDownStrategy to the preferred scaling strategy:

Strategy	Description
`IdleOnly`	Default strategy. Only idle Nodes are selected for removal. This strategy ensures that only Nodes not actively running workloads are removed.
`PreferIdle`	Idle Nodes are prioritized for removal, but if there are not enough idle Nodes to meet the desired scale, non-idle Nodes may also be selected. This allows for more aggressive scaling down when necessary.

Idle Nodes

A Node is idle when it has no Pods in the Running or Pending state. When determining a Node's idle status, CKS ignores Pods with any of the following conditions:

Condition	Description
`qos.coreweave.com/graceful-interruptible` is set to `true` and the Pod's `DeletionGracePeriodSeconds` has elapsed	Graceful interruptible Pods are ignored only if they are being deleted and the deletion grace period has elapsed.
The Pod is part of a `DaemonSet`	Pods managed by a `DaemonSet` are ignored when determining Node idleness.
`qos.coreweave.cloud/interruptable` label set to `true`	Interruptible Pods are ignored when determining Node idleness.
`ns.coreweave.cloud/org` label set to `control-plane`	Control Plane Pods are ignored when determining Node idleness.

You can determine a Node's idle status by inspecting the CWActive condition. CWActive = False means the Node is idle.

Once CKS selects a Node for removal, the Node Pool Operator (NPO) removes the Node by first cordoning it to prevent any new Pods from being scheduled onto it. Then, NPO drains the Node to perform a graceful cleanup. Once the Node is fully drained, it is removed from the Node Pool.

Note

DeletionGracePeriodSeconds defaults to 30 seconds unless the Pod's spec.terminationGracePeriodSeconds is set to a different value.

Important

During the drain process, to prevent being stopped by Pods stuck in a Terminating state, NPO waits up to 5 minutes before more forcibly removing a draining Node.

How to scale down a Node Pool

To scale down a Node Pool:

Set spec.targetNodes to the desired number of Nodes. Manage Node Pools explains how to do this with Cloud Console or the Kubernetes CLI.
Set spec.lifecycle.scaleDownStrategy to the preferred scaling strategy.

Node Pool types

CKS has two types of Node Pools:

Default Node Pools

The default Node Pool type (spec.computeClass: default) is used for Reserved and On-Demand instances. Billing for default Node Pools depends on Reservation and utilization.

Default Node Pools do not need to be specified

If no computeClass is specified, the Node Pool defaults to the default type.

Example default Node Pool manifest

apiVersion: compute.coreweave.com/v1alpha1
kind: NodePool
metadata:
  name: default
spec:
  computeClass: default
  ...

For more information on Node Pool manifests, see Node Pool reference, or learn how to Create a Node Pool.

Availability

Node Pools are available in all General Access Regions.

Control Plane Node Pool (deprecated)

Older clusters created before July 7, 2025 were provisioned with a cpu-control-plane Node Pool, used for CKS-managed components. Newer clusters do not have this Node Pool. The components are now managed out-of-band by the CKS Control Plane. See the Control Plane Node Pool release notes for more information.

Node cordoning​

Scaling strategies​

Idle Nodes​

How to scale down a Node Pool​

Node Pool types​

Default Node Pools​

Availability​

Control Plane Node Pool (deprecated)​