Introduction to Node Pools
Manage groups of Nodes as a single entity with Node Pools
Instances are specific hardware configurations defined by CoreWeave, and each GPU and CPU instance type has its own hourly price.
A Node Pool in CKS represents one or more instances that share a common configuration, such as the same labels, taints, and annotations. Node Pools are configured by selecting the instance type to use and how many of that instance type that you would like to run for your cluster. They are deployed either by using a Kubernetes manifest or by using the Cloud Console.
Once a Node Pool is deployed to a cluster, CKS continuously monitors it to ensure that the number of running Nodes matches the number specified in the Node Pool's manifest.
Multiple Node Pools may be deployed within a single cluster, where each Node Pool contains any number of Nodes. This allows you to run different types of workloads on different types of Nodes, or to scale different parts of your application independently.
View pricing for Instance types on the Instances Pricing page.
Node cordoning
In CKS, Nodes are sometimes cordoned in order to ensure that workloads are only scheduled to healthy Nodes. In most cases, cordoning is eventually removed, making the Node schedulable again. This kind of Node cordoning is managed entirely by CoreWeave, though customers may also manually cordon Nodes for their own reasons.
You may add your own Node conditions to the Nodes, but do not leverage CoreWeave Node conditions
for automation. CoreWeave Node conditions
are intended for internal use only and are not intended for clients to use for their own custom management automation. Nodes may be cordoned by CoreWeave for maintenance purposes, or to resolve temporary issues.
A Node may be cordoned for several reasons, such as:
- Maintenance: If a Node requires maintenance, updates, or hardware fixes, CKS cordons it to ensure no new workloads are placed on it during that time. This allows the Node Life Cycle controller to make necessary changes without disrupting running tasks.
- Node draining for removal: If a Node needs to be removed from the cluster, the Node will be cordoned prior to draining it. Workloads are automatically rescheduled onto healthy Nodes, and no new workloads will be scheduled to the cordoned Node.
- InfiniBand or Ethernet link flaps: Link flaps are intermittent, unpredictable up-down transitions in a network connection, which can result in networking or communication failures. If an InfiniBand or Ethernet link is flapping, a Node can experience inconsistent or unreliable connectivity. In this case, the Node is cordoned to ensure no workloads are scheduled to a Node with an unreliable network connection.
- Temporary health check failures: Kubernetes uses various health checks to assess the state of a user's system. A temporary check fail might indicate transient issues that could degrade Node performance. The Node is cordoned until the issue is resolved.
Users should not assume that cordoned Nodes have substantial or permanent issues. If a cordoned Node is deemed to have a fault that cannot be easily resolved, CKS will move it out of production and into triage.
Cordoning Nodes in these cases allows CKS to prevent disruptions. If you have questions about Node cordoning, or would like to manually cordon Nodes for another reason, please contact support.
Scaling strategies
The CKS cluster autoscaler has two possible scale-down strategies. Set spec.lifecycle.scaleDownStrategy
to the preferred scaling strategy:
Strategy | Description |
---|---|
IdleOnly | Default strategy. Only idle Nodes are selected for removal. This strategy ensures that only Nodes not actively running workloads are removed. |
PreferIdle | Idle Nodes are prioritized for removal, but if there are not enough idle Nodes to meet the desired scale, non-idle Nodes may also be selected. This allows for more aggressive scaling down when necessary. |
Idle Nodes
A Node is idle when it has no Pods in the Running
or Pending
state. When determining a Node's idle status, CKS ignores Pods with any of the following conditions:
Condition | Description |
---|---|
qos.coreweave.com/graceful-interruptible is set to true and the Pod's DeletionGracePeriodSeconds has elapsed | Graceful interruptible Pods are ignored only if they are being deleted and the deletion grace period has elapsed. |
The Pod is part of a DaemonSet | Pods managed by a DaemonSet are ignored when determining Node idleness. |
qos.coreweave.cloud/interruptable label set to true | Interruptible Pods are ignored when determining Node idleness. |
ns.coreweave.cloud/org label set to control-plane | Control Plane Pods are ignored when determining Node idleness. |
You can determine a Node's idle status by inspecting the CWActive
condition. CWActive = False
means the Node is idle.
Once CKS selects a Node for removal, the Node Pool Operator (NPO) removes the Node by first cordoning it to prevent any new Pods from being scheduled onto it. Then, NPO drains the Node to perform a graceful cleanup. Once the Node is fully drained, it is removed from the Node Pool.
DeletionGracePeriodSeconds
defaults to 30 seconds unless the Pod's spec.terminationGracePeriodSeconds
is set to a different value.
During the drain process, to prevent being stopped by Pods stuck in a Terminating
state, NPO waits up to 5 minutes before more forcibly removing a draining Node.
How to scale down a Node Pool
To scale down a Node Pool:
- Set
spec.targetNodes
to the desired number of Nodes. Manage Node Pools explains how to do this with Cloud Console or the Kubernetes CLI. - Set
spec.lifecycle.scaleDownStrategy
to the preferred scaling strategy.
Node Pool types
CKS has two types of Node Pools:
Default Node Pools
The default Node Pool type (spec.computeClass: default
) is used for Reserved and On-Demand instances. Billing for default Node Pools depends on Reservation and utilization.
If no computeClass
is specified, the Node Pool defaults to the default
type.
apiVersion: compute.coreweave.com/v1alpha1kind: NodePoolmetadata:name: defaultspec:computeClass: default...
For more information on Node Pool manifests, see Node Pool reference, or learn how to Create a Node Pool.
Availability
Node Pools are available in all General Access Regions.
Control Plane Node Pool (deprecated)
Older clusters created before July 7, 2025 were provisioned with a cpu-control-plane
Node Pool, used for CKS-managed components. Newer clusters do not have this Node Pool. The components are now managed out-of-band by the CKS Control Plane. See the Control Plane Node Pool release notes for more information.