CoreWeave surfaces NVIDIA GPU health signals, including XID counts (driver-reported errors) and thermal data, through the Node Details Grafana dashboard. Most XIDs indicate a transient or hardware issue rather than something to fix in your workload, and CoreWeave’s Node Lifecycle Controller drains and triages affected Nodes automatically when a serious event occurs.
For the meaning of each XID, see the NVIDIA XID reference. Contact support if a Node is repeatedly draining for the same reason or if your workload is hitting GPU errors that aren’t surfaced as drain events.
Nodes
Server Errors
How do I interpret GPU health events?
Last modified on June 18, 2026