Reboot Nodes - CoreWeave Docs

This guide explains how to manually reboot CKS Nodes, including how to choose between safe and force reboot options. It’s for cluster administrators and operators who need to apply system updates or recover Nodes without disrupting active workloads. You can manually reboot Nodes in two ways:

Standard Node reboot: A typical reboot to apply a system update or other change.
Reconfigure reboot: A reboot to apply an updated Node Pool configuration to existing Nodes.

This guide covers standard Node reboots. To reboot Nodes to apply Node Pool configuration updates, see Apply Node Pool Configuration Updates. Nodes are managed as part of CoreWeave’s Node lifecycle. The Node Life Cycle controller handles required reboots to ensure that customer workloads aren’t interrupted. To learn how CoreWeave-initiated reboots behave, see Node state transitions in CKS.

Ways to manually reboot Nodes

Before requesting a reboot, choose the option that fits your situation. You can reboot Nodes whether or not they still have active workloads. Active Nodes are those with the CWActive condition set to true, which indicates they have active workloads such as non-DaemonSet Pods or Slurm jobs.

A safe reboot marks the Node to reboot after it becomes idle. The reboot occurs only after all active workloads stop. New Pods might still be scheduled while waiting.
A force reboot reboots the Node immediately without waiting for active workloads to stop.

To check a Node’s active status, use the cwic node get or cwic node describe commands. After the Node reboots, it remains in a reboot state while a short self-test runs. After the test passes, it returns to a production state and accepts customer workloads.

Prerequisites

Before you begin, ensure you have:

An active CoreWeave account.
An API Access Token.
Kubectl and the latest version of the CoreWeave Intelligent CLI installed locally.

Optional, for Slurm users only: scontrol installed locally to trigger reboots with scontrol.

Request a Node reboot

You can request a Node reboot in the following ways:

Use the CoreWeave Intelligent CLI to set up a reboot (recommended).
Use Slurm’s scontrol command (for Slurm users).

Use the CoreWeave Intelligent CLI

To use the CoreWeave Intelligent CLI to reboot a Node, run the following command, choosing a flag from the following list and replacing [NODE-NAMES] with a space-separated list of Nodes you want to reboot:

cwic node reboot [--force|--safe|--unset] [NODE-NAMES]

The --force flag queues a force reboot, the --safe flag queues a safe reboot, and the --unset flag clears the reboot condition. For example, to queue a force reboot for the Nodes node-1 and node-2, run:

cwic node reboot --force node-1 node-2

Adding a reason for the reboot is optional, but recommended. To set a message for the reboot, use the --message flag. For example, run:

cwic node reboot --message "Reason for this reboot" node-1 node-2

Run cwic node reboot --help for more information.

Use `scontrol` (Slurm)

To safely reboot Nodes in a cluster when they become idle, use scontrol reboot. Slurm ensures the Node is idle, then reboots it by setting the appropriate reboot condition. To reboot example-node with scontrol, run:

scontrol reboot example-node

For more information, see the Slurm scontrol reboot documentation.

​Ways to manually reboot Nodes

​Prerequisites

​Request a Node reboot

​Use the CoreWeave Intelligent CLI

​Use scontrol (Slurm)

Ways to manually reboot Nodes

Prerequisites

Request a Node reboot

Use the CoreWeave Intelligent CLI

Use `scontrol` (Slurm)