Instances are specific hardware configurations defined by CoreWeave, and each GPU and CPU instance type has its own hourly price. A Node Pool in CKS represents one or more instances that share a common configuration, such as the same labels, taints, and annotations. Node Pools are configured by selecting the instance type to use and how many of that instance type that you would like to run for your cluster. They are deployed either by using a Kubernetes manifest or by using the Cloud Console. Once a Node Pool is deployed to a cluster, CKS continuously monitors it to ensure that the number of running Nodes matches the number specified in the Node Pool’s manifest. Multiple Node Pools may be deployed within a single cluster, where each Node Pool contains any number of Nodes. This allows you to run different types of workloads on different types of Nodes, or to scale different parts of your application independently.Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
View pricing for Instance types on CoreWeave’s Pricing page.
Availability
Node Pools are available in all General Access Regions.Node cordoning
In CKS, Nodes are sometimes cordoned in order to ensure that workloads are only scheduled to healthy Nodes. In most cases, cordoning is eventually removed, making the Node schedulable again. This kind of Node cordoning is managed entirely by CoreWeave, though you may also manually cordon Nodes for your own reasons.You may add your own Node conditions to the Nodes, but do not leverage CoreWeave Node
conditions for automation. CoreWeave Node conditions are intended for internal use only and are not intended for clients to use for their own custom management automation. Nodes may be cordoned by CoreWeave for maintenance purposes, or to resolve temporary issues.- Maintenance: If a Node requires maintenance, updates, or hardware fixes, CKS cordons it to ensure no new workloads are placed on it during that time. This allows the Node Life Cycle controller to make necessary changes without disrupting running tasks.
- Node draining for removal: If a Node needs to be removed from the cluster, the Node will be cordoned prior to draining it. Workloads are automatically rescheduled onto healthy Nodes, and no new workloads will be scheduled to the cordoned Node.
- InfiniBand or Ethernet link flaps: Link flaps are intermittent, unpredictable up-down transitions in a network connection, which can result in networking or communication failures. If an InfiniBand or Ethernet link is flapping, a Node can experience inconsistent or unreliable connectivity. In this case, the Node is cordoned to ensure no workloads are scheduled to a Node with an unreliable network connection.
- Temporary health check failures: Kubernetes uses various health checks to assess the state of a user’s system. A temporary check fail might indicate transient issues that could degrade Node performance. The Node is cordoned until the issue is resolved.
Scaling strategies
The CKS cluster autoscaler has two possible scale-down strategies. Setspec.lifecycle.scaleDownStrategy to the preferred scaling strategy:
| Strategy | Description |
|---|---|
IdleOnly | Default strategy. Only idle Nodes are selected for removal. This strategy ensures that only Nodes not actively running workloads are removed. |
PreferIdle | Idle Nodes are prioritized for removal, but if there are not enough idle Nodes to meet the desired scale, non-idle Nodes may also be selected. This allows for more aggressive scaling down when necessary. |
Idle Nodes
A Node is idle when it has no Pods in theRunning or Pending state. When determining a Node’s idle status, CKS ignores Pods with any of the following conditions:
| Condition | Description |
|---|---|
qos.coreweave.com/graceful-interruptible is set to true and the Pod’s DeletionGracePeriodSeconds has elapsed | Graceful interruptible Pods are ignored only if they are being deleted and the deletion grace period has elapsed. |
The Pod is part of a DaemonSet | Pods managed by a DaemonSet are ignored when determining Node idleness. |
qos.coreweave.cloud/interruptable label set to true | Interruptible Pods are ignored when determining Node idleness. |
ns.coreweave.cloud/org label set to control-plane | Control Plane Pods are ignored when determining Node idleness. |
CWActive condition. CWActive = False means the Node is idle.
Once CKS selects a Node for removal, the Node Pool Operator (NPO) removes the Node by first cordoning it to prevent any new Pods from being scheduled onto it. Then, NPO drains the Node to perform a graceful cleanup. Once the Node is fully drained, it is removed from the Node Pool.
DeletionGracePeriodSeconds defaults to 30 seconds unless the Pod’s spec.terminationGracePeriodSeconds is set to a different value.How to scale down a Node Pool
To scale down a Node Pool:- Set
spec.targetNodesto the desired number of Nodes. Manage Node Pools explains how to do this with Cloud Console or the Kubernetes CLI. - Set
spec.lifecycle.scaleDownStrategyto the preferred scaling strategy.
Node Pool types
CKS uses thedefault Node Pool type to manage Reserved and On-Demand instances.
Default Node Pools
The default Node Pool type (spec.computeClass: default) is used for Reserved and On-Demand instances. Billing for default Node Pools depends on Reservation and utilization.
Default Node Pools do not need to be specified.If no
computeClass is specified, the Node Pool defaults to the default type.Example default Node Pool manifest
Image pull best practices
CoreWeave operates a region-level registry proxy that accelerates container image pulls and reduces exposure to public registry rate limits for your cluster’s Nodes. To ensure predictable and fast rollouts when scaling Node Pools, you should use immutable tags (or pin by digest) for production workloads to guarantee every Node pulls the exact same artifact and to avoid “sticky” results from proxy metadata caching. You should avoid mutable tags like:latest. With metadata caching enabled, the proxy may continue serving a cached manifest until the cache expires, which can lead to inconsistent versions across Nodes. See region-level image proxy for more information.
Methods for rebooting Nodes
You can manually reboot Nodes in the following two ways:| Reboot Method | When to Use | Typical Duration |
|---|---|---|
| Reboot only | Maintenance, troubleshooting, or applying changes that do not require reconfiguration | Approximately 10 minutes or more |
| Reconfigure reboot | Applying OS images, GPU driver updates, or other Node Pool modifications that require reconfiguration | Approximately 1 hour or more |
Node conditions for reboots (deprecated)
CoreWeave’s Node conditions are visible when managing reboots. You used to set these Node conditions using the Conditioner Kubectl plugin, but now you can use the CoreWeave Intelligent CLI to reboot Nodes without using Node conditions. See the Reboot Nodes and Apply Node Pool updates guides for more information.| Condition | Description | Deprecated equivalent |
|---|---|---|
AdminSafePowerCycle | Marks the Node to reboot when it is idle, only after all running jobs are complete. | AdminSafeReboot |
AdminImmediatePowerCycle | Marks the Node to reboot when it is idle, without waiting for running jobs to complete. | AdminImmediateReboot |
Control Plane Node Pool (deprecated)
Older clusters created before July 7, 2025 were provisioned with acpu-control-plane Node Pool, used for CKS-managed components. Newer clusters do not have this Node Pool. The components are now managed out-of-band by the CKS Control Plane. See the Control Plane Node Pool release notes for more information.