Skip to main content

Apply Node Pool updates

Apply Node Pool updates by queuing a reconfigure reboot

Some Node Pool modifications require you to both reconfigure and reboot the Nodes to apply updates successfully, for example, a new system OS image or GPU driver update. To apply these updates, you can queue a reconfigure reboot for the Node Pool. If you need to reboot Nodes manually to apply a system update or other change, without reconfiguring the Node, see Reboot Nodes.

Prerequisites

Before you begin, ensure you have:

Queue a reconfigure reboot

To queue a reconfigure reboot for a Node Pool, run the following command, replacing [list-of-space-separated-nodes] with the list of Nodes you want to reboot:

Example
$
cwic node reboot --reconfigure [list-of-space-separated-nodes]

When you submit the command, it queues a reconfigure reboot to begin as soon as the Nodes are idle. Meanwhile, Nodes will be cordoned to prevent scheduling new workloads.

The reconfiguration and reboot process can take up to an hour:

  • During the reconfiguration and reboot, the Node PhaseState condition moves to production-reconfigure-powercycle-test, and the Node becomes unavailable.
  • When the reconfiguration is complete, the Node PhaseState condition returns to production and the Node is uncordoned.
Reboot Nodes in small batches

When rebooting Nodes, limit the number of Nodes that are rebooted at one time to avoid service interruptions. If you need to reboot 50 or more Nodes at a time, please contact support for assistance.

Optional flags

There are two optional cwic flags you can use to expedite the total reboot time: --no-test to skip post-reboot validation, and --force to queue the reboot to begin immediately.

Skip post-reboot validation

CoreWeave performs a post-reboot validation of a node after any reboot to ensure it is operating within acceptable tolerances. This test usually consumes about 30 minutes of the overall reboot time. To expedite reboots, you can skip the validation by using the --no-test flag. This sets the node PhaseState to production-reconfigure-powerreset.

Queue reboot to start immediately

You can queue a reconfigure reboot to begin immediately by adding the --force flag, but this makes no consideration of the active state of the Nodes. Before using this flag, please ensure the Nodes are idle or otherwise able to interrupt running workflows.