Skip to main content

GPU Selection Guide

Determine which GPU type is best for your workloads

How to select a GPU node type

Choosing the right GPU is imperative to optimizing workflows for model training, fine-tuning, and inference. Selecting the right hardware from the beginning mitigates the consumption of idle compute, enables demand to be served in faster, and lowers latency, while incurring costs only for resources that are actually used.

But as the GPU benchmark comparison guide demonstrates, there is no one-size-fits-all GPU type. Knowing how to select the right GPU node type for your specific use case is imperative to achieving efficiency in both cost and performance.

This guide offers a broad overview on how we at CoreWeave compare our arsenal of NVIDIA GPUs for model serving and inference.

Tip

In addition to the general guidance provided here, it is highly recommended that clients benchmark their own workloads, in order to select the best hardware for price and performance for their specific use case.

GPU overviews

Additional Resources

For direct benchmark comparisons of each GPU type, see the GPU Benchmarks and Comparison guide.

The guide below offers a general overview of CoreWeave's available GPU models, and some of their best applications in general.

NVIDIA HGX H100

The NVIDIA HGX H100 enables up to seven times more efficient high-performance computing (HPC) applications, up to nine times faster AI training on large models, and up to thirty times faster AI inference than the NVIDIA HGX A100.

Important

Due to high demand, A100 NVLINK (HGX) and H100 NVLINK (HGX) nodes are currently fully committed on client contracts, and are therefore not currently available for on-demand use.

We recommend a conversation with the CoreWeave team to build a strategic plan catered to your needs to make use of available infrastructure and to plan for your future capacity requirements. Contact CoreWeave Sales to get started.

H100 HGX (80GB)

ItemValue
Cost per hour$4.76
GPUs8
vRAM80GB
Memory bandwidth3TB/sec
TFLOPs fp161000 TFLOPS
NVLink Interconnect900GB/s

NVIDIA Quadro RTX 4000

Just because the NVIDIA Turing architecture-based NVIDIA Quadro RTX™ 4000 is the smallest GPU that CoreWeave offers doesn't mean it's not cost-effective. If you need to run inference for models such as the Fairseq 2.7B, GPT Neo 2.7B, or smaller, this option is an excellent value for less intensive inference workloads.

Larger contexts may require the Quadro RTX 5000, depending on how efficient your inference engine is. However, if you are saturating the GPU with inference requests, then the more recent GPUs - such as the NVIDIA RTX A4000 or RTX A5000 - may better serve your use case.

Quadro RTX 4000 specs

ItemValue
Cost per hour$0.24
vRAM8 GB
RAM1625 mhz
Memory bandwidth (GB/s)415
Shader cores2,304
Tensor cores288
TFLOPs fp1614.2

NVIDIA RTX 5000

The Turing-based NVIDIA® Quadro RTX 5000 is the smallest GPU that can run inference for the GPT-J 6B or Fairseq 6.7B models. It sports double the RAM and a bit more memory bandwidth than the RTX 4000, with a much faster base clock rate.

If your 2.7B models are running out of RAM with a larger context, the RTX 5000 is the next step up.

RTX 5000 specs

ItemValue
Cost per hour$0.57
vRAM16 GB
RAM1750 mhz
Memory bandwidth (GB/s)448
Shader cores3,072
Tensor cores384
TFLOPs fp1622.3

NVIDIA RTX A6000

If your workload is intense enough, the NVIDIA Ampere architecture-based NVIDIA RTX A6000 is one of the best values for inference. This model is CoreWeave's generally recommended GPU for fine-tuning, due to its 48GB of RAM, which enables fine-tuning up to Fairseq 13B on a single GPU. This 48GB RAM offering also allows for batch training steps during fine-tuning for better throughput.

The RTX A6000 is the smallest single NVIDIA GPU that can host the GPT NeoX 20B model.

RTX A6000 specs

ItemValue
Cost per hour$1.28
vRAM48 GB
RAM2000 mhz
Memory bandwidth (GB/s)768
Shader cores10,752
Tensor cores336

NVIDIA A40

Because of its value proposition, the NVIDIA A40 is our recommended GPU for larger-scale training jobs. While the RTX A6000 is slightly faster, the A40 has more robust GPU drivers and has more availability at CoreWeave.

The A40's 48GB of RAM enables batch-training steps during fine-tuning for better throughput. For this reason, many of our tutorials - such as Fine-tune Large Language Models with CoreWeave Cloud - were developed using NVIDIA A40 nodes.

A40 specs

ItemValue
Cost per hour$1.28
vRAM48 GB
RAM1812 mhz
Memory bandwidth (GB/s)695
Shader cores10,752
Tensor cores336

NVIDIA A100 40GB PCIe

The NVIDIA A100 40GB PCIe GPU nearly doubles the performance of the NVIDIA A40 or RTX A6000 on a single GPU basis for many workloads, due to double its memory bandwidth. However, it features 8GB less RAM than the A40, making the A40 better suited to host larger models, such as GPT NeoX 20B, on a single GPU.

If inference throughput is the primary concern, pairs of NVIDIA A100 PCIe GPUs can make excellent inference nodes.

Note

NVIDIA NVLink interconnect is recommended for distributed training and inference when model parallelism is required.

A100 40GB specs

ItemValue
Cost per hour$2.06
vRAM40 GB
RAM1215 mhz
Memory bandwidth (GB/s)1500
Shader cores6,912
Tensor cores432

A100 80GB specs

ItemValue
Cost per hour$2.21
vRAM80 GB
RAM1593 mhz
Memory bandwidth (GB/s)2000
Shader cores6,912
Tensor cores432

Performance comparisons

To learn more about the comparative performance of each GPU listed here, see the GPU Benchmark Comparison.