GPU Instance Details
CoreWeave provides a diverse portfolio of GPU instances tailored to a wide spectrum of computational workloads, from graphics rendering to large-scale AI model training and inference. The following section provides a technical overview of our GPU offerings and outlines their suitability for various machine learning tasks, with a specific focus on deploying Large Language Models (LLMs).
GPU comparison
The following summary table provides a side-by-side comparison of the GPU compute instances available at CoreWeave. Available instance types vary per Region, and pricing is available in the Pricing section.
- GPU Specs
- CPU & Memory
- Storage & Network
Name | Instance ID | GPU Count | GPU RAM (GB) | Performance Tier |
---|---|---|---|---|
GB300 (InfiniBand) | gb300-4x | 4 | 279 | State-of-the-Art Compute |
GB200 (InfiniBand) | gb200-4x | 4 | 186 | State-of-the-Art Compute |
B200 (InfiniBand) | b200-8x | 8 | 180 | State-of-the-Art Compute |
H200 (InfiniBand) | gd-8xh200ib-i128 | 8 | 141 | Mid-large Size Model Training & Inference |
H100 (InfiniBand) | gd-8xh100ib-i128 | 8 | 80 | Mid-large Size Model Training & Inference |
RTX Pro 6000 Blackwell Server Edition | rtxp6000-8x | 8 | 96 | Professional AI & Graphics |
L40S | gd-8xl40s-i128 | 8 | 48 | Professional AI & Graphics |
L40 | gd-8xl40-i128 | 8 | 48 | Professional AI & Graphics |
GH200 | gd-1xgh200 | 1 | 96 | Professional AI & Graphics |
A100 | gd-8xa100-i128 | 8 | 80 | Professional AI & Graphics |
Name | Instance ID | CPU Model | Speed (GHz) | vCPU Count | RAM (GB) |
---|---|---|---|---|---|
GB300 (InfiniBand) | gb300-4x | 2x NVIDIA Grace Arm v9 | 3.10 | 144 | 960 |
GB200 (InfiniBand) | gb200-4x | 2x NVIDIA Grace Arm v9 | 3.10 | 144 | 960 |
B200 (InfiniBand) | b200-8x | Intel Emerald Rapids 8562Y+ | 2.80 | 128 | 2048 |
H200 (InfiniBand) | gd-8xh200ib-i128 | Intel Emerald Rapids 8562Y+ | 2.80 | 128 | 2048 |
H100 (InfiniBand) | gd-8xh100ib-i128 | Intel Sapphire Rapids 8462Y+ | 2.80 | 128 | 2048 |
RTX Pro 6000 Blackwell Server Edition | rtxp6000-8x | Intel Emerald Rapids 8562Y+ | 2.80 | 128 | 1024 |
L40S | gd-8xl40s-i128 | Intel Sapphire Rapids 8462Y+ | 2.80 | 128 | 1024 |
L40 | gd-8xl40-i128 | Intel Sapphire Rapids 8462Y+ | 2.80 | 128 | 1024 |
GH200 | gd-1xgh200 | NVIDIA Grace Arm v9 | 3.10 | 72 | 480 |
A100 | gd-8xa100-i128 | Intel Ice Lake 8358 | 2.60 | 128 | 2048 |
Name | Instance ID | Local Disk (TB) | Network Speed | GPU Connectivity |
---|---|---|---|---|
GB300 (InfiniBand) | gb300-4x | 61.44 | Dual-port 200GbE | InfiniBand & NVLink |
GB200 (InfiniBand) | gb200-4x | 30.72 | Dual-port 100GbE | InfiniBand & NVLink |
B200 (InfiniBand) | b200-8x | 61.44 | Dual-port 100GbE | InfiniBand & NVLink |
H200 (InfiniBand) | gd-8xh200ib-i128 | 61.44 | Dual-port 100GbE | InfiniBand & NVLink |
H100 (InfiniBand) | gd-8xh100ib-i128 | 61.44 | Dual-port 100GbE | InfiniBand & NVLink |
RTX Pro 6000 Blackwell Server Edition | rtxp6000-8x | 7.68 | Dual-port 100GbE | PCIe |
L40S | gd-8xl40s-i128 | 7.68 | Dual-port 100GbE | PCIe |
L40 | gd-8xl40-i128 | 7.68 | Dual-port 100GbE | PCIe |
GH200 | gd-1xgh200 | 7.68 | Dual-port 100GbE | PCIe |
A100 | gd-8xa100-i128 | 7.68 | Single-port 100GbE | NVLink |
Instances with Dual-port 100GbE networking virtualize the interfaces to present a single interface with a total network bandwidth of 200Gbps to the Node.
GPU compute tiers
CoreWeave's GPU instances are categorized into three distinct tiers, each optimized for specific workloads and performance requirements:
- State-of-the-Art Compute Tier: For unprecedented scale, these instances are designed for the most demanding AI workloads, including training and inference of multi-trillion parameter models.
- Mid-large Size Model Training & Inference Tier: For training and inference of large models, these instances provide a balanced combination of GPU memory and compute power.
- Professional AI & Graphics Tier: For professional AI and graphics workloads, these instances are optimized for high-performance graphics rendering, AI inference, and smaller-scale model training.
See Selecting an Instance for help choosing the right instance for your workload.
State-of-the-Art Compute Tier: For unprecedented scale
GB300 NVL72-powered instances
Powered by four NVIDIA GB300 Superchips, each featuring a "Blackwell Ultra" GPU with an unprecedented 279 GB of memory, these instances represent the absolute pinnacle of our high-performance computing offerings. These instances form part of a larger NVL72 rack architecture which boasts 21 TB of total GPU memory, interconnected by 5th-generation NVLink for a seamless, rack-scale memory fabric. For clustering, they are equipped with next-generation NVIDIA Quantum-X800 InfiniBand and ConnectX-8 SuperNICs, delivering a groundbreaking 800 Gbps of network bandwidth, double that of the previous generation.
- Primary Use Cases: Training next-generation foundation models in the trillion-parameter class, massive-scale and high-fidelity inference on the most complex AI models, and scientific simulations requiring maximum memory capacity and the fastest data throughput.
- Recommended Models: Future frontier models (1T+ parameters), next-generation multimodal systems, and large-scale scientific discovery models.
- Note: Instances must be requested in multiples of 18 to leverage the full NVL72 rack architecture.
Specifications: GB300 (InfiniBand)
Specification | Value |
---|---|
Instance ID | gb300-4x |
GPU Count | 4 |
GPU RAM (GB) | 279 |
CPU Model | 2x NVIDIA Grace (Arm v9) |
vCPU Count | 144 |
RAM (GB) | 960 |
Local Disk (TB) | 61.44 |
Network Speed | Dual-port 200GbE |
GPU Connectivity | InfiniBand & NVLink |
Cost per Hour | $52.00 |
GB200 NVL72-powered instances
Powered by NVIDIA GB200 GPUs with InfiniBand connectivity, these instances deliver a monumental leap in performance over the previous generation, setting the new standard for large-scale AI and high-performance computing. The GB200 instances are equipped with 13 TB of high-bandwidth GPU memory per rack and fifth-generation NVLink with up to 130 TB/s total bandwidth. Blackwell GPUs feature a second-generation Transformer Engine with new FP4 and FP6 precision, drastically accelerating LLM inference and training. The NVL72 architecture provides unprecedented memory capacity and ultra-fast GPU-to-GPU communication for distributed computing at the largest scale.
- Primary Use Cases: Training foundation models from scratch (500B+ parameters), massive-scale inference, and developing frontier AI systems.
- Recommended Models: GPT-5-class and future state-of-the-art models, complex Mixture-of-Experts (MoE) models, and large multimodal systems.
- Note: Instances must be requested in multiples of 18 to leverage the full NVL72 rack architecture.
Specifications: GB200 (InfiniBand)
Specification | Value |
---|---|
Instance ID | gb200-4x |
GPU Count | 4 |
GPU RAM (GB) | 186 |
CPU Model | 2x NVIDIA Grace (Arm v9) |
vCPU Count | 144 |
RAM (GB) | 960 |
Local Disk (TB) | 30.72 |
Network Speed | Dual-port 100GbE |
GPU Connectivity | InfiniBand & NVLink |
Cost per Hour | $42.00 |
B200 instances
Powered by eight NVIDIA B200 Blackwell GPUs, each with 180 GB of HBM3e memory, these instances deliver outstanding performance for enterprise AI workloads, including training and real-time inference. They offer more than twice the performance per GPU compared to previous-generation NVIDIA Hopper GPUs. The GPUs are interconnected using NVIDIA NVLink and NVLink Switch to enable fast data transfer. For networking, B200 instances feature an NVIDIA BlueField-3 DPU and eight NVIDIA ConnectX-7 InfiniBand host channel adapters (HCAs), supporting high throughput and lossless scaling. A 400G NDR non-blocking NVIDIA Quantum-2 InfiniBand fabric ensures seamless, high-speed connectivity.
- Primary Use Cases: Training foundation models from scratch (100B+ parameters), massive-scale inference, and developing frontier AI systems.
- Recommended Models: GPT-5-class and future state-of-the-art models, complex Mixture-of-Experts (MoE) models, and large multimodal systems.
Specifications: B200 (InfiniBand)
Specification | Value |
---|---|
Instance ID | b200-8x |
GPU Count | 8 |
GPU RAM (GB) | 180 |
CPU Model | Intel Emerald Rapids (8562Y+, 2.80 GHz) |
vCPU Count | 128 |
RAM (GB) | 2048 |
Local Disk (TB) | 61.44 |
Network Speed | Dual-port 100GbE |
GPU Connectivity | InfiniBand & NVLink |
Cost per Hour | $68.80 |
Mid-large Size Model Training & Inference Tier
H200 instances
Powered by eight NVIDIA H200 Hopper GPUs, each equipped with 141 GB of ultra-fast HBM3e memory, these instances are purpose-built for memory-intensive AI and HPC workloads. The significant memory per GPU allows for training larger models with bigger batch sizes and longer context windows compared to its predecessor. The GPUs are interconnected with NVIDIA NVLink for a high-speed, unified memory pool within the server. For clustering, they feature NVIDIA ConnectX-7 adapters providing a 400G NDR InfiniBand fabric, making them ideal for large, distributed training jobs that are memory-bound.
- Primary Use Cases: Training and fine-tuning large models (~70B parameters) with high precision, high-throughput inference with long context lengths, and memory-intensive scientific computing.
- Recommended Models: Llama 3 70B (at full precision), Falcon 180B (quantized), and other large models requiring substantial GPU memory for a single Node.
Specifications: H200 (InfiniBand)
Specification | Value |
---|---|
Instance ID | gd-8xh200ib-i128 |
GPU Count | 8 |
GPU RAM (GB) | 141 |
CPU Model | Intel Emerald Rapids (8562Y+, 2.80 GHz) |
vCPU Count | 128 |
RAM (GB) | 2048 |
Local Disk (TB) | 61.44 |
Network Speed | Dual-port 100GbE |
GPU Connectivity | InfiniBand & NVLink |
Cost per Hour | $50.44 |
H100 instances
These instances are powered by eight NVIDIA H100 Hopper GPUs, each with 80 GB of HBM3 memory. As the industry-defining accelerator of the previous generation, the H100 remains a powerful and reliable workhorse for a wide range of large-scale AI workloads. GPUs are fully connected with NVLink, and Nodes are linked with a 400G NDR InfiniBand fabric, providing a robust architecture for distributed training and inference.
- Primary Use Cases: Large-scale AI training, high-performance inference serving, and a wide variety of HPC simulations.
- Recommended Models: Llama 3 70B (with quantization or ZeRO), Mixtral 8x7B, Qwen2 72B, and fine-tuning most open-source models.
Specifications: H100 (InfiniBand)
Specification | Value |
---|---|
Instance ID | gd-8xh100ib-i128 |
GPU Count | 8 |
GPU RAM (GB) | 80 |
CPU Model | Intel Sapphire Rapids (8462Y+, 2.80 GHz) |
vCPU Count | 128 |
RAM (GB) | 2048 |
Local Disk (TB) | 61.44 |
Network Speed | Dual-port 100GbE |
GPU Connectivity | InfiniBand & NVLink |
Cost per Hour | $49.24 |
Professional AI & Graphics Tier
RTX Pro 6000 (Blackwell) instances
Powered by eight NVIDIA RTX Pro 6000 GPUs based on the Blackwell architecture, these instances provide a versatile and efficient solution for a mix of AI and professional graphics workloads. Each GPU has 96 GB of GDDR7 memory and connects via a high-speed PCIe interface. While they lack NVLink for memory pooling across GPUs, their large individual GPU memory makes them excellent for running multiple inference jobs in parallel or handling high-resolution rendering tasks.
- Primary Use Cases: High-performance inference and fine-tuning for mid-sized models and high-fidelity rendering.
- Recommended Models: Inference and fine-tuning of models up to 70B, such as Llama 3 70B and Mixtral 8x7B, on a per-GPU basis.
Specifications: RTX Pro 6000 Blackwell Server Edition
Specification | Value |
---|---|
Instance ID | rtxp6000-8x |
GPU Count | 8 |
GPU RAM (GB) | 96 |
CPU Model | Intel Emerald Rapids (8562Y+, 2.80 GHz) |
vCPU Count | 128 |
RAM (GB) | 1024 |
Local Disk (TB) | 7.68 |
Network Speed | Dual-port 100GbE |
GPU Connectivity | PCIe |
Cost per Hour | $20.00 |
L40S instances
These instances feature eight NVIDIA L40S GPUs, each with 48 GB of GDDR6 memory. The L40S is a compute-optimized version of the L40, specifically tuned to deliver higher performance for AI and data science workloads. Connected via PCIe, these GPUs are a cost-effective choice for scaling out inference and for fine-tuning mainstream models. They provide a strong balance of performance and value for enterprise AI deployment.
- Primary Use Cases: High-throughput inference, efficient fine-tuning, video analytics, and other compute-focused AI tasks.
- Recommended Models: Inference for models up to ~40B parameters like Qwen2 32B or LLaVA 34B; efficient fine-tuning of 7B-13B models.
Specifications: L40S
Specification | Value |
---|---|
Instance ID | gd-8xl40s-i128 |
GPU Count | 8 |
GPU RAM (GB) | 48 |
CPU Model | Intel Sapphire Rapids (8462Y+, 2.80 GHz) |
vCPU Count | 128 |
RAM (GB) | 1024 |
Local Disk (TB) | 7.68 |
Network Speed | Dual-port 100GbE |
GPU Connectivity | PCIe |
Cost per Hour | $18.00 |
L40 instances
Powered by eight NVIDIA L40 GPUs, each with 48 GB of GDDR6 memory, these instances are designed for exceptional versatility across AI and visual computing. Connected via PCIe, they offer a balanced profile for workloads that include AI inference, rendering, and video processing. They are an ideal entry point for deploying AI-powered services that also have a significant graphics component.
- Primary Use Cases: General-purpose AI inference, small-scale model fine-tuning, high-resolution rendering, and virtual desktop infrastructure (VDI).
- Recommended Models: Inference for models up to ~40B parameters; ideal for models like Stable Diffusion and various computer vision models.
Specifications: L40
Specification | Value |
---|---|
Instance ID | gd-8xl40-i128 |
GPU Count | 8 |
GPU RAM (GB) | 48 |
CPU Model | Intel Sapphire Rapids (8462Y+, 2.80 GHz) |
vCPU Count | 128 |
RAM (GB) | 1024 |
Local Disk (TB) | 7.68 |
Network Speed | Dual-port 100GbE |
GPU Connectivity | PCIe |
Cost per Hour | $10.00 |
GH200 instances
Powered by a single NVIDIA GH200 Grace Hopper Superchip, this instance offers a unique architecture that combines a 72-core Grace CPU with a Hopper GPU. Its defining feature is the 576 GB unified memory pool (96 GB HBM3e + 480 GB LPDDR5X) connected via the high-speed NVLink-C2C interconnect. This design eliminates traditional PCIe bottlenecks and allows the GPU to access a massive memory space, making it unparalleled for running extremely large models on a single machine.
- Primary Use Cases: Low-latency inference for models that are too large to fit in the memory of a standard GPU, large-scale graph analytics, and memory-intensive data science.
- Recommended Models: Inference on models like Llama 3.1 405B, Nemotron-4 340B, and other models in the 100B-500B parameter range.
Specifications: GH200
Specification | Value |
---|---|
Instance ID | gd-1xgh200 |
GPU Count | 1 |
GPU RAM (GB) | 96 |
CPU Model | NVIDIA Grace (Arm v9) |
vCPU Count | 72 |
RAM (GB) | 480 |
Local Disk (TB) | 7.68 |
Network Speed | Dual-port 100GbE |
GPU Connectivity | PCIe |
Cost per Hour | $6.50 |
A100 instances
These instances are powered by eight NVIDIA A100 Tensor Core GPUs, each providing 80 GB of high-bandwidth HBM2e memory. As a foundational accelerator for the previous generation of AI, these are versatile instances that deliver excellent performance for mixed-precision workloads. The GPUs are interconnected with NVIDIA NVLink, making them a powerful and proven choice for distributed training. Their configuration offers a cost-effective solution for a wide range of demanding AI and data analytics tasks.
- Primary Use Cases: AI model training, high-performance inference, and various HPC applications including scientific simulations.
- Recommended Models: Training models up to 40B parameters like Qwen2 32B from scratch; a standard choice for fine-tuning most large open-source models. For inference, they can handle models up to 70B, such as Llama 3 70B, on a per-GPU basis.
Specifications: A100
Specification | Value |
---|---|
Instance ID | gd-8xa100-i128 |
GPU Count | 8 |
GPU RAM (GB) | 80 |
CPU Model | Intel Ice Lake (8358, 2.60 GHz) |
vCPU Count | 128 |
RAM (GB) | 2048 |
Local Disk (TB) | 7.68 |
Network Speed | Single-port 100GbE |
GPU Connectivity | NVLink |
Cost per Hour | $21.60 |