Skip to main content

GPU Instance Details

CoreWeave provides a diverse portfolio of GPU instances tailored to a wide spectrum of computational workloads, from graphics rendering to large-scale AI model training and inference. The following section provides a technical overview of our GPU offerings and outlines their suitability for various machine learning tasks, with a specific focus on deploying Large Language Models (LLMs).

GPU comparison

The following summary table provides a side-by-side comparison of the GPU compute instances available at CoreWeave. Available instance types vary per Region, and pricing is available in the Pricing section.

GPU compute tiers

CoreWeave's GPU instances are categorized into three distinct tiers, each optimized for specific workloads and performance requirements:

  • State-of-the-Art Compute Tier: For unprecedented scale, these instances are designed for the most demanding AI workloads, including training and inference of multi-trillion parameter models.
  • Mid-large Size Model Training & Inference Tier: For training and inference of large models, these instances provide a balanced combination of GPU memory and compute power.
  • Professional AI & Graphics Tier: For professional AI and graphics workloads, these instances are optimized for high-performance graphics rendering, AI inference, and smaller-scale model training.

See Selecting an Instance for help choosing the right instance for your workload.

State-of-the-Art Compute Tier: For unprecedented scale

GB300 NVL72-powered instances

Powered by four NVIDIA GB300 Superchips, each featuring a "Blackwell Ultra" GPU with an unprecedented 279 GB of memory, these instances represent the absolute pinnacle of our high-performance computing offerings. These instances form part of a larger NVL72 rack architecture which boasts 21 TB of total GPU memory, interconnected by 5th-generation NVLink for a seamless, rack-scale memory fabric. For clustering, they are equipped with next-generation NVIDIA Quantum-X800 InfiniBand and ConnectX-8 SuperNICs, delivering a groundbreaking 800 Gbps of network bandwidth, double that of the previous generation.

  • Primary Use Cases: Training next-generation foundation models in the trillion-parameter class, massive-scale and high-fidelity inference on the most complex AI models, and scientific simulations requiring maximum memory capacity and the fastest data throughput.
  • Recommended Models: Future frontier models (1T+ parameters), next-generation multimodal systems, and large-scale scientific discovery models.
  • Note: Instances must be requested in multiples of 18 to leverage the full NVL72 rack architecture.
Specifications: GB300 (InfiniBand)
SpecificationValue
Instance IDgb300-4x
GPU Count4
GPU RAM (GB)279
CPU Model2x NVIDIA Grace (Arm v9)
vCPU Count144
RAM (GB)960
Local Disk (TB)61.44
Network SpeedDual-port 200GbE
GPU ConnectivityInfiniBand & NVLink
Cost per Hour$52.00

GB200 NVL72-powered instances

Powered by NVIDIA GB200 GPUs with InfiniBand connectivity, these instances deliver a monumental leap in performance over the previous generation, setting the new standard for large-scale AI and high-performance computing. The GB200 instances are equipped with 13 TB of high-bandwidth GPU memory per rack and fifth-generation NVLink with up to 130 TB/s total bandwidth. Blackwell GPUs feature a second-generation Transformer Engine with new FP4 and FP6 precision, drastically accelerating LLM inference and training. The NVL72 architecture provides unprecedented memory capacity and ultra-fast GPU-to-GPU communication for distributed computing at the largest scale.

  • Primary Use Cases: Training foundation models from scratch (500B+ parameters), massive-scale inference, and developing frontier AI systems.
  • Recommended Models: GPT-5-class and future state-of-the-art models, complex Mixture-of-Experts (MoE) models, and large multimodal systems.
  • Note: Instances must be requested in multiples of 18 to leverage the full NVL72 rack architecture.
Specifications: GB200 (InfiniBand)
SpecificationValue
Instance IDgb200-4x
GPU Count4
GPU RAM (GB)186
CPU Model2x NVIDIA Grace (Arm v9)
vCPU Count144
RAM (GB)960
Local Disk (TB)30.72
Network SpeedDual-port 100GbE
GPU ConnectivityInfiniBand & NVLink
Cost per Hour$42.00

B200 instances

Powered by eight NVIDIA B200 Blackwell GPUs, each with 180 GB of HBM3e memory, these instances deliver outstanding performance for enterprise AI workloads, including training and real-time inference. They offer more than twice the performance per GPU compared to previous-generation NVIDIA Hopper GPUs. The GPUs are interconnected using NVIDIA NVLink and NVLink Switch to enable fast data transfer. For networking, B200 instances feature an NVIDIA BlueField-3 DPU and eight NVIDIA ConnectX-7 InfiniBand host channel adapters (HCAs), supporting high throughput and lossless scaling. A 400G NDR non-blocking NVIDIA Quantum-2 InfiniBand fabric ensures seamless, high-speed connectivity.

  • Primary Use Cases: Training foundation models from scratch (100B+ parameters), massive-scale inference, and developing frontier AI systems.
  • Recommended Models: GPT-5-class and future state-of-the-art models, complex Mixture-of-Experts (MoE) models, and large multimodal systems.
Specifications: B200 (InfiniBand)
SpecificationValue
Instance IDb200-8x
GPU Count8
GPU RAM (GB)180
CPU ModelIntel Emerald Rapids (8562Y+, 2.80 GHz)
vCPU Count128
RAM (GB)2048
Local Disk (TB)61.44
Network SpeedDual-port 100GbE
GPU ConnectivityInfiniBand & NVLink
Cost per Hour$68.80

Mid-large Size Model Training & Inference Tier

H200 instances

Powered by eight NVIDIA H200 Hopper GPUs, each equipped with 141 GB of ultra-fast HBM3e memory, these instances are purpose-built for memory-intensive AI and HPC workloads. The significant memory per GPU allows for training larger models with bigger batch sizes and longer context windows compared to its predecessor. The GPUs are interconnected with NVIDIA NVLink for a high-speed, unified memory pool within the server. For clustering, they feature NVIDIA ConnectX-7 adapters providing a 400G NDR InfiniBand fabric, making them ideal for large, distributed training jobs that are memory-bound.

  • Primary Use Cases: Training and fine-tuning large models (~70B parameters) with high precision, high-throughput inference with long context lengths, and memory-intensive scientific computing.
  • Recommended Models: Llama 3 70B (at full precision), Falcon 180B (quantized), and other large models requiring substantial GPU memory for a single Node.
Specifications: H200 (InfiniBand)
SpecificationValue
Instance IDgd-8xh200ib-i128
GPU Count8
GPU RAM (GB)141
CPU ModelIntel Emerald Rapids (8562Y+, 2.80 GHz)
vCPU Count128
RAM (GB)2048
Local Disk (TB)61.44
Network SpeedDual-port 100GbE
GPU ConnectivityInfiniBand & NVLink
Cost per Hour$50.44

H100 instances

These instances are powered by eight NVIDIA H100 Hopper GPUs, each with 80 GB of HBM3 memory. As the industry-defining accelerator of the previous generation, the H100 remains a powerful and reliable workhorse for a wide range of large-scale AI workloads. GPUs are fully connected with NVLink, and Nodes are linked with a 400G NDR InfiniBand fabric, providing a robust architecture for distributed training and inference.

  • Primary Use Cases: Large-scale AI training, high-performance inference serving, and a wide variety of HPC simulations.
  • Recommended Models: Llama 3 70B (with quantization or ZeRO), Mixtral 8x7B, Qwen2 72B, and fine-tuning most open-source models.
Specifications: H100 (InfiniBand)
SpecificationValue
Instance IDgd-8xh100ib-i128
GPU Count8
GPU RAM (GB)80
CPU ModelIntel Sapphire Rapids (8462Y+, 2.80 GHz)
vCPU Count128
RAM (GB)2048
Local Disk (TB)61.44
Network SpeedDual-port 100GbE
GPU ConnectivityInfiniBand & NVLink
Cost per Hour$49.24

Professional AI & Graphics Tier

RTX Pro 6000 (Blackwell) instances

Powered by eight NVIDIA RTX Pro 6000 GPUs based on the Blackwell architecture, these instances provide a versatile and efficient solution for a mix of AI and professional graphics workloads. Each GPU has 96 GB of GDDR7 memory and connects via a high-speed PCIe interface. While they lack NVLink for memory pooling across GPUs, their large individual GPU memory makes them excellent for running multiple inference jobs in parallel or handling high-resolution rendering tasks.

  • Primary Use Cases: High-performance inference and fine-tuning for mid-sized models and high-fidelity rendering.
  • Recommended Models: Inference and fine-tuning of models up to 70B, such as Llama 3 70B and Mixtral 8x7B, on a per-GPU basis.
Specifications: RTX Pro 6000 Blackwell Server Edition
SpecificationValue
Instance IDrtxp6000-8x
GPU Count8
GPU RAM (GB)96
CPU ModelIntel Emerald Rapids (8562Y+, 2.80 GHz)
vCPU Count128
RAM (GB)1024
Local Disk (TB)7.68
Network SpeedDual-port 100GbE
GPU ConnectivityPCIe
Cost per Hour$20.00

L40S instances

These instances feature eight NVIDIA L40S GPUs, each with 48 GB of GDDR6 memory. The L40S is a compute-optimized version of the L40, specifically tuned to deliver higher performance for AI and data science workloads. Connected via PCIe, these GPUs are a cost-effective choice for scaling out inference and for fine-tuning mainstream models. They provide a strong balance of performance and value for enterprise AI deployment.

  • Primary Use Cases: High-throughput inference, efficient fine-tuning, video analytics, and other compute-focused AI tasks.
  • Recommended Models: Inference for models up to ~40B parameters like Qwen2 32B or LLaVA 34B; efficient fine-tuning of 7B-13B models.
Specifications: L40S
SpecificationValue
Instance IDgd-8xl40s-i128
GPU Count8
GPU RAM (GB)48
CPU ModelIntel Sapphire Rapids (8462Y+, 2.80 GHz)
vCPU Count128
RAM (GB)1024
Local Disk (TB)7.68
Network SpeedDual-port 100GbE
GPU ConnectivityPCIe
Cost per Hour$18.00

L40 instances

Powered by eight NVIDIA L40 GPUs, each with 48 GB of GDDR6 memory, these instances are designed for exceptional versatility across AI and visual computing. Connected via PCIe, they offer a balanced profile for workloads that include AI inference, rendering, and video processing. They are an ideal entry point for deploying AI-powered services that also have a significant graphics component.

  • Primary Use Cases: General-purpose AI inference, small-scale model fine-tuning, high-resolution rendering, and virtual desktop infrastructure (VDI).
  • Recommended Models: Inference for models up to ~40B parameters; ideal for models like Stable Diffusion and various computer vision models.
Specifications: L40
SpecificationValue
Instance IDgd-8xl40-i128
GPU Count8
GPU RAM (GB)48
CPU ModelIntel Sapphire Rapids (8462Y+, 2.80 GHz)
vCPU Count128
RAM (GB)1024
Local Disk (TB)7.68
Network SpeedDual-port 100GbE
GPU ConnectivityPCIe
Cost per Hour$10.00

GH200 instances

Powered by a single NVIDIA GH200 Grace Hopper Superchip, this instance offers a unique architecture that combines a 72-core Grace CPU with a Hopper GPU. Its defining feature is the 576 GB unified memory pool (96 GB HBM3e + 480 GB LPDDR5X) connected via the high-speed NVLink-C2C interconnect. This design eliminates traditional PCIe bottlenecks and allows the GPU to access a massive memory space, making it unparalleled for running extremely large models on a single machine.

  • Primary Use Cases: Low-latency inference for models that are too large to fit in the memory of a standard GPU, large-scale graph analytics, and memory-intensive data science.
  • Recommended Models: Inference on models like Llama 3.1 405B, Nemotron-4 340B, and other models in the 100B-500B parameter range.
Specifications: GH200
SpecificationValue
Instance IDgd-1xgh200
GPU Count1
GPU RAM (GB)96
CPU ModelNVIDIA Grace (Arm v9)
vCPU Count72
RAM (GB)480
Local Disk (TB)7.68
Network SpeedDual-port 100GbE
GPU ConnectivityPCIe
Cost per Hour$6.50

A100 instances

These instances are powered by eight NVIDIA A100 Tensor Core GPUs, each providing 80 GB of high-bandwidth HBM2e memory. As a foundational accelerator for the previous generation of AI, these are versatile instances that deliver excellent performance for mixed-precision workloads. The GPUs are interconnected with NVIDIA NVLink, making them a powerful and proven choice for distributed training. Their configuration offers a cost-effective solution for a wide range of demanding AI and data analytics tasks.

  • Primary Use Cases: AI model training, high-performance inference, and various HPC applications including scientific simulations.
  • Recommended Models: Training models up to 40B parameters like Qwen2 32B from scratch; a standard choice for fine-tuning most large open-source models. For inference, they can handle models up to 70B, such as Llama 3 70B, on a per-GPU basis.
Specifications: A100
SpecificationValue
Instance IDgd-8xa100-i128
GPU Count8
GPU RAM (GB)80
CPU ModelIntel Ice Lake (8358, 2.60 GHz)
vCPU Count128
RAM (GB)2048
Local Disk (TB)7.68
Network SpeedSingle-port 100GbE
GPU ConnectivityNVLink
Cost per Hour$21.60