Skip to main content
HPC Verification is a hardware health check that CoreWeave runs on GPU Nodes during initial provisioning and on an hourly cadence afterward. It exercises the GPUs, memory, and InfiniBand interfaces. Nodes that fail are drained and routed to triage before they become available for scheduling. Failures appear as Slurm drain reasons (NHC: <reason>) and through the Node Details Grafana dashboard. For full details, see HPC Verification.
Administrator
Last modified on June 18, 2026