CoreWeave’s automated Day 1 operations move Nodes through a sequence of states to ready them for production deployment.Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
Day 1 Node states
After Day 0, the Node transitions to the Onboard state where a data center technician (DCT) conducts final physical inspections and manages the cabling. After the DCT certifies the Node, CoreWeave automatically initiates Day 1 operations, moving the Node through a sequence of states, starting with Seatrial, to ready them for production deployment.Seatrial
The Seatrial phase serves as a critical observation period, during which the Node is scrutinized for potential issues. Continuous automated monitoring covers the following areas:- Cabling: Proper connection of cables to their respective adapters.
- Power: All power supplies function within their specified parameters.
- Inventory Validation: Verification of the installation of correct GPUs, storage, memory, and other essential components.
Zap
During the Zap state, the Node undergoes a comprehensive firmware upgrade process, affecting the GPU, PCI Retimer, BMC, BIOS, and other components. This procedure typically spans one to two hours.- Successful completion of the Zap state advances the Node to the Test state.
- Failure to pass, or a test delay exceeding 6 hours, moves the Node to the Zap Fail state for further analysis.
Test
During this 24-hour period the Node undergoes extensive testing designed to uncover any underlying hardware or software anomalies. This includes a set of proprietary testing and Node failure prediction capabilities:- Proprietary burn-in testing stress-tests compute, networking, and storage subsystems to catch early hardware faults before Nodes enter the production fleet.
- Advanced chip-level testing performs deep diagnostics on GPU memory, interconnects, and compute performance to identify marginal or latent hardware issues.
- Predictive failure detection monitors hardware signals and error trends to forecast and preempt failures before they impact workloads. This capability continues into Day 2+ operations, where CoreWeave analyzes historical test data to identify patterns and fine-tune performance over time.
- Passing the Test state means the Node is ready for Production.
- Any issues detected during this phase move the Node to the Triage state.