Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

CoreWeave’s automated Day 1 operations move Nodes through a sequence of states to ready them for production deployment.

Day 1 Node states

After Day 0, the Node transitions to the Onboard state where a data center technician (DCT) conducts final physical inspections and manages the cabling. After the DCT certifies the Node, CoreWeave automatically initiates Day 1 operations, moving the Node through a sequence of states, starting with Seatrial, to ready them for production deployment.

Seatrial

The Seatrial phase serves as a critical observation period, during which the Node is scrutinized for potential issues. Continuous automated monitoring covers the following areas:
  • Cabling: Proper connection of cables to their respective adapters.
  • Power: All power supplies function within their specified parameters.
  • Inventory Validation: Verification of the installation of correct GPUs, storage, memory, and other essential components.
Following the Seatrial, the Node progresses to the Zap state.

Zap

During the Zap state, the Node undergoes a comprehensive firmware upgrade process, affecting the GPU, PCI Retimer, BMC, BIOS, and other components. This procedure typically spans one to two hours.
  • Successful completion of the Zap state advances the Node to the Test state.
  • Failure to pass, or a test delay exceeding 6 hours, moves the Node to the Zap Fail state for further analysis.

Test

During this 24-hour period the Node undergoes extensive testing designed to uncover any underlying hardware or software anomalies. This includes a set of proprietary testing and Node failure prediction capabilities:
  • Proprietary burn-in testing stress-tests compute, networking, and storage subsystems to catch early hardware faults before Nodes enter the production fleet.
  • Advanced chip-level testing performs deep diagnostics on GPU memory, interconnects, and compute performance to identify marginal or latent hardware issues.
  • Predictive failure detection monitors hardware signals and error trends to forecast and preempt failures before they impact workloads. This capability continues into Day 2+ operations, where CoreWeave analyzes historical test data to identify patterns and fine-tune performance over time.
Thanks to this automated process, CoreWeave seamlessly provisions Nodes around the clock, ensuring a constant state of readiness and operational excellence. The Test period results in one of the following outcomes:
  • Passing the Test state means the Node is ready for Production.
  • Any issues detected during this phase move the Node to the Triage state.

Production

Nodes that reach the Production state are deemed ready for use. Their allocation and cluster assignments are managed by CKS. In the Production state, the Node remains under continuous proactive monitoring, which triggers further lifecycle events if any issues are detected. To learn more, see Day 2+.

Triage and RMA

Nodes relegated to the Triage state are temporarily sidelined from production. Post-Triage, the Node is either directed to the RMA state for vendor repairs or to the Debug state for in-depth troubleshooting. If the Node is determined to be ready for redeployment, it is moved to the Onboard state, where it starts a new lifecycle. Nodes that have been refurbished by the vendor in the RMA state are reintroduced into the Onboard state, where they begin a new lifecycle. Even though customers do not interact with Nodes in the Triage and RMA states, CoreWeave’s full lifecycle automation, including through these states, enables you to enjoy reliable, performant fleets without wasted time and effort spent dealing with Nodes when they do fail.
Last modified on April 13, 2026