Node Pools
Manage groups of Nodes as a single entity with Node Pools
Instances are specific hardware configurations defined by CoreWeave, and each Instance type has its own hourly price. A Node Pool in CKS represents one or more instances that share a common configuration, such as the same labels, taints, and annotations. Node Pools are configured by selecting the instance type to use and how many of that instance type that you would like to run for your cluster.
Once a Node Pool is deployed to a cluster, CKS continuously monitors it to ensure that the number of running Nodes matches the number specified in the Node Pool's manifest.
Multiple Node Pools may be deployed within a single cluster, where each Node Pool contains any number of Nodes. This allows you to run different types of workloads on different types of Nodes, or to scale different parts of your application independently.
View pricing for Instance types on the Instances Pricing page.
The control plane Node Pool
Each cluster is provisioned with an initial Node Pool featuring two CPU Nodes, which run the Kubernetes control plane components such as the CSI, the CNI, cluster DNS, metrics, and other components. This Node Pool, called cpu-control-plane
, is automatically created when a CKS cluster is created, and appears in the Node Pool list on the Cloud Console once a CKS cluster is in a Healthy
state.
Node cordoning
In CKS, Nodes are sometimes cordoned in order to ensure that workloads are only scheduled to healthy Nodes. In most cases, cordoning is eventually removed, making the Node schedulable again. Node cordoning is managed entirely by CoreWeave.
Users should not leverage Node conditions
for automation, as the Node conditions are intended for internal use only. CoreWeave may cordon Nodes for maintenance purposes, or, to resolve temporary issues. Conditions are not intended for clients to use for designing their own custom management automation.
A Node may be cordoned for several reasons, such as:
- Maintenance: If a Node requires maintenance, updates, or hardware fixes, CKS cordons it to ensure no new workloads are placed on it during that time. This allows the Node Life Cycle controller to make necessary changes without disrupting running tasks.
- Node draining for removal: If a Node needs to be removed from the cluster, the Node will be cordoned prior to draining it. Workloads are automatically rescheduled onto healthy Nodes, and no new workloads will be scheduled to the cordoned Node.
- InfiniBand or Ethernet link flaps: Link flaps are intermittent, unpredictable up-down transitions in a network connection, which can result in networking or communication failures. If an InfiniBand or Ethernet link is flapping, a Node can experience inconsistent or unreliable connectivity. In this case, the Node is cordoned to ensure no workloads are scheduled to a Node with an unreliable network connection.
- Temporary health check failures: Kubernetes uses various health checks to assess the state of a user's system. A temporary check fail might indicate transient issues that could degrade Node performance. The Node is cordoned until the issue is resolved.
Users should not assume that cordoned Nodes have substantial or permanent issues. If a cordoned Node is deemed to have a fault that cannot be easily resolved, CKS will move it out of production and into triage.
Cordoning Nodes in these cases allows CKS to prevent disruptions. If you have questions about Node cordoning, or would like to manually cordon Nodes for another reason, please contact support.
Regions of availability
At this time, Node Pools are available for deployment in all of CoreWeave's General Access Regions, however self-service Node Pool creation is unavailable in RNO2
.