Skip to main content
CoreWeave surfaces NVIDIA GPU health signals, including XID counts (driver-reported errors) and thermal data, through the Node Details Grafana dashboard. Most XIDs indicate a transient or hardware issue rather than something to fix in your workload, and CoreWeave’s Node Lifecycle Controller drains and triages affected Nodes automatically when a serious event occurs. For the meaning of each XID, see the NVIDIA XID reference. Contact support if a Node is repeatedly draining for the same reason or if your workload is hitting GPU errors that aren’t surfaced as drain events.
Nodes Server Errors
Last modified on June 18, 2026