Observability is critical to monitoring and maintaining the health and performance of a CoreWeave Cloud environment. The collection and visualization of metrics, logs, and events can help to identify potential issues and optimize future workloads. CoreWeave provides a rich corpus of telemetry data, collecting and indexing over 200 million metrics samples per second across all customer environments. You can select the observability solutions that best fit your needs.Documentation Index
Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
Use this file to discover all available pages before exploring further.
Observability solutions
CoreWeave Observe™ offers the following solutions for collecting, observing, monitoring, and visualizing your CoreWeave logs and metrics:- CoreWeave Grafana: CoreWeave’s fully-managed Grafana instance provides an immediate observability solution at no cost to customers. Explore performance metrics and a variety of pre-configured dashboards optimized to visualize complex workloads. This option requires no setup or maintenance, but does not allow for customization of the provided dashboards. See Introduction to CoreWeave Grafana for more information.
- Weights & Biases: Weights & Biases (W&B) enables AI developers to build AI agents, applications, and models with confidence. When building models on CoreWeave Kubernetes Service (CKS), CoreWeave infrastructure alerts such as GPU failures, thermal violations, and more can be integrated effortlessly into your W&B dashboards. See the Weights & Biases documentation for more information.
- CoreWeave Metrics: Query metrics with PromQL, the Prometheus Query Language. PromQL is the flexible, expressive language built for querying metrics stored in Prometheus-compatible backends. See Introduction to CoreWeave Logs and Metrics for more information.
- CoreWeave Logs: Query logs with LogQL, the Grafana Loki querying language. See Introduction to CoreWeave Logs and Metrics for more information.
- CoreWeave Telemetry Relay and telemetry forwarding: Forward telemetry data to endpoints outside of CoreWeave. See CoreWeave Telemetry Relay for more information.
- CoreWeave Mission Control Agent: Conversational AI interface for diagnosing issues, summarizing operations, and surfacing insights from metrics, currently in preview. See CoreWeave Mission Control Agent for more information.
- Self-hosted Grafana: Self-host a Grafana instance to create and customize dashboards, including those provided with CoreWeave Grafana, without limitation. This option requires more setup and configuration than the managed option, but grants the freedom to create fully-customized dashboards that can be tuned to effectively display unique workloads. See Self-hosted Grafana for more information.
- Resource Usage: Monitor compute, storage, and networking usage in Grafana to manage costs and optimize performance. See Cost and Usage Monitoring for more information.
- CoreWeave Alerts: Real-time notifications about your clusters, deployments, and operations, delivered via Slack or webhook. See CoreWeave Alerts for more information.