Day 1 - CoreWeave Docs

CoreWeave’s automated Day 1 operations move Nodes through a sequence of states to ready them for production deployment. This page describes each state in the Day 1 lifecycle. Use it to understand how CoreWeave validates Node health before workloads run, and to locate a Node when you see it reported in a given state.

Day 1 Node states

After Day 0, the Node transitions to the Onboard state where a data center technician (DCT) conducts final physical inspections and manages the cabling. After the DCT certifies the Node, CoreWeave automatically initiates Day 1 operations, which move the Node through a sequence of states, starting with Seatrial, to ready it for production deployment.

Seatrial

The Seatrial phase is a critical observation period during which CoreWeave scrutinizes the Node for potential issues. Continuous automated monitoring covers the following areas:

Cabling: Proper connection of cables to their respective adapters.
Power: All power supplies function within their specified parameters.
Inventory validation: Verification of the installation of correct GPUs, storage, memory, and other essential components.

After the Seatrial, the Node progresses to the Zap state.

Zap

During the Zap state, the Node undergoes a firmware upgrade process that affects the GPU, PCI Retimer, BMC, BIOS, and other components. This procedure typically spans one to two hours.

Successful completion of the Zap state advances the Node to the Test state.
A failure to pass, or a test delay that exceeds 6 hours, moves the Node to the Zap Fail state for further analysis.

Test

During this 24-hour period, the Node undergoes extensive tests that uncover any underlying hardware or software anomalies. This includes a set of proprietary testing and Node failure prediction capabilities:

Proprietary burn-in testing stress-tests compute, networking, and storage subsystems to catch early hardware faults before Nodes enter the production fleet.
Advanced chip-level testing performs deep diagnostics on GPU memory, interconnects, and compute performance to identify marginal or latent hardware issues.
Predictive failure detection monitors hardware signals and error trends to forecast and preempt failures before they impact workloads. This capability continues into Day 2+ operations, where CoreWeave analyzes historical test data to identify patterns and fine-tune performance over time.

Because this process runs automatically, CoreWeave provisions Nodes around the clock so the fleet stays in a constant state of readiness. At the end of the Test period, the Node moves to one of the following states:

Passing the Test state means the Node is ready for Production.
Any issues detected during this phase move the Node to the Triage state.

Production

Nodes that reach the Production state are ready for use. CoreWeave Kubernetes Service (CKS) manages their allocation and cluster assignments. In the Production state, the Node remains under continuous monitoring, which triggers further lifecycle events if CoreWeave detects any issues. To learn more, see Day 2+.

Triage and RMA

Nodes that fail any earlier Day 1 check land in the Triage or RMA states instead of reaching Production. CoreWeave temporarily sidelines Nodes in the Triage state from production. After Triage, CoreWeave either directs the Node to the RMA state for vendor repairs or to the Debug state for in-depth troubleshooting. If the Node is ready for redeployment, CoreWeave moves it to the Onboard state, where it starts a new lifecycle. Nodes that the vendor refurbishes in the RMA state return to the Onboard state, where they begin a new lifecycle. Customers don’t interact with Nodes in the Triage and RMA states. CoreWeave’s full lifecycle automation, including through these states, gives you reliable, performant fleets without wasted time and effort spent dealing with Nodes when they fail.

​Day 1 Node states

​Seatrial

​Zap

​Test

​Production

​Triage and RMA

Day 1 Node states

Seatrial

Zap

Test

Production

Triage and RMA