To view the dashboard, go to the Training Jobs dashboard.Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
For accessing CoreWeave Grafana dashboards instructions, see Access and use CoreWeave Grafana dashboards.
| Panel Title | Description |
|---|---|
| Kind | Shows the Kubernetes object kind. |
| Name | Shows the name of the resource. |
| Nodes | Shows the total number of Nodes. |
| Pods | Shows the total number of Pods. |
| Uptime | Shows the overall uptime. |
| Pod Readiness Timeline | Shows the readiness state of the Pods. |
| Active GPUs | Shows the number of active GPUs. |
| Job Efficiency | Shows the efficiency of running jobs. |
| Current FP8 FLOPS | Displays the current floating-point operations per second in FP8 precision. |
| Node conditions | Displays the current conditions of the Nodes. |
| Alerts | Displays active alerts related to this resource. |
| Nodes (Range) | Shows the individual Pods, their running status, and uptime on each Node. |
| GPU Temperatures Running Jobs | Shows the GPU temperatures for running jobs. |
| GPU Core Utilization | Shows GPU core usage. |
| SM Utilization | Shows the utilization of streaming multiprocessors on the GPU. |
| GPU Mem Copy | Shows GPU memory copy operations. |
| Tensor Core Util | Shows the utilization of Tensor Cores. |
| Current FP8 | Shows the current FP8 performance. |
| VRAM Usage | Displays the video RAM usage. |
| GPUs Temperature | Displays the temperature of the GPUs. |
| InfiniBand Aggregate Bandwidth | Shows the total network bandwidth over the InfiniBand interconnect. |
| GPUs Power Usage | Displays the power consumption of the GPUs. |
| Local Max Disk I/O Utilization (1m) | Shows the maximum disk I/O utilization on the local disk over 1 minute. |
| Local Avg Bytes Read / Written Per Nod | Shows the average bytes read/written per Node on the local disk. |
| Local Total Bytes Read / Written (2m) | Shows the total bytes read/written on the local disk over 2 minutes. |
| Local Total Read / Write Rate (2m) | Shows the total read/write rate on the local disk over 2 minutes. |
| NFS Average Request Time by Operation | Shows duration requests took from when a request was enqueued to when it was completely handled for a given operation, in seconds. |
| NFS Avg Bytes Read / Written Per Node | Shows the average bytes read/written per Node on the NFS. |
| NFS Total Bytes Read / Written (2m) | Shows the total bytes read/written on the NFS over 2 minutes. |
| NFS Total Read / Write Rate (2m) | Shows the total read/write rate on the NFS over 2 minutes. |
| NFS Average Response Time by Operation | Shows duration requests took to get a reply back after a request for a given operation was transmitted, in seconds. |
| NFS Avg Write Rate Per Active Node (2m) | Shows the average NFS write rate per active Node. Only includes Nodes reading/writing over 10KB/s. |
| NFS Avg Read Rate Per Active Node (2m) | Shows the average NFS read rate per active Node. Only includes Nodes reading/writing over 10KB/s. |
| NFS Nodes with Retransmissions | Shows the count of NFS Nodes experiencing network retransmissions. |