NVIDIA HGX H100

Leverage the most powerful supercomputing platform on CoreWeave Cloud

CoreWeave's infrastructure is purpose-built for large-scale, GPU-accelerated workloads. We specialize in serving the most demanding AI and machine learning applications. To this end, CoreWeave is proud to be one of the only Cloud platforms in the world offering NVIDIA's most powerful end-to-end AI supercomputing platform.

The NVIDIA HGX H100 enables up to seven times more efficient high-performance computing (HPC) applications, up to nine times faster AI training on large models, and up to thirty times faster AI inference than the NVIDIA HGX A100.

Additional Resources

Learn more about how MosaicML with CoreWeave is making NVIDIA's most powerful supercomputer more accessible.

System Specifications

The table below presents the specifications of both NVIDIA HGX H100 GPU models at peak performance:

	HGX H-100 4-GPU	HGX H-100 8-GPU
FP64	134 TFLOPS	268 TFLOPS
FP64 Tensor Core	268 TFLOPS	535 TFLOPS
FP32	268 TFLOPS	535 TFLOPS
TF32 Tensor Core	3958 TFLOPS*	7,915 TFLOPS*
FP16 Tensor Core	7915 TFLOPS*	15,830 TFLOPS*
FP8 Tensor Core	15,830 TFLOPS*	31,662 TFLOPS*
INT8 Tensor Core	15,830 TOPS*	31,662 TOPS*
GPU Memory	320 GB	640GB
Aggregate GPU Memory Bandwidth	13TB/s	27TB/s
Maximum Number of MIG Instances	28	56
NVIDIA NVLink	Fourth-generation NVLink 900GB/s	Fourth-generation NVLink 900GB/s
NVIDIA NVSwitch	N/A	Third-generation NVSwitch
NVSwitch GPU-GPU bandwidth	N/A	900GB/s
In-network compute	N/A	3.6 TFLOPS
Total aggregate network bandwidth	3.6TB/s	7.2TB/s

* with sparsity.

The 8-GPU model provides significantly higher computational power and is better suited for highly demanding tasks that require intense GPU-GPU communication. It's ideal for large-scale AI training and for applications that involve massive data volumes.

The 4-GPU model, while still highly capable, targets slightly less intensive and exascale computing tasks. It focuses on maximizing GPU density while minimizing required space and power.

Features

Hyperfast compute plus the lowest available network latency for extremely fast training times

The intense speeds of the HGX H100, combined with the lowest NVIDIA GPUDirect network latency on the market - the NVIDIA Quantum-2 InfiniBand platform - reduces the training time of AI models to "days or hours, instead of months."

With AI permeating nearly every industry today, this speed and efficiency has never been more vital for HPC applications.

FP8 support with Transformer Engine for quicker onboarding to H100

The open source Python library Transformer Engine by NVIDIA enables the use of the FP8 (8-bit floating point) format on Hopper GPUs, the card architecture utilized by HGX H100s by providing .

Although all major deep learning frameworks support the FP16 format, FP8 support is not natively available in many frameworks today, a problem that Transformer Engine addresses:

[...] With Hopper GPU architecture, FP8 precision was introduced, which offers improved performance over FP16 with no degradation in accuracy. Although all major deep learning frameworks support FP16, FP8 support is not available natively in frameworks today.

TE addresses the problem of FP8 support by providing APIs that integrate with popular Large Language Model (LLM) libraries. It provides a Python API consisting of modules to easily build a Transformer layer as well as a framework agnostic library in C++ including structs and kernels needed for FP8 support. Modules provided by TE internally maintain scaling factors and other values needed for FP8 training, greatly simplifying mixed precision training for users.

– Faster Transformer

Tutorials on using Faster Transformer on CoreWeave are forthcoming.

Additional Resources

Learn more from NVIDIA about FP8 and why it matters in ML and AI applications.

Read more on MosaicML about how HGX H100s on CoreWeave accelerate training operations while preserving model quality.

Make a reservation for HGX H100

Important

Due to high demand, A100 NVLINK (HGX) and H100 NVLINK (HGX) nodes are currently fully committed on client contracts, and are therefore not available for on-demand use. We recommend a conversation with the CoreWeave team to build a strategic plan catered to your needs to make use of available infrastructure and to plan for your future capacity requirements. Contact CoreWeave Sales to get started.

If your needs demand the highest performance in supercomputing coupled with the lowest-latency networking available, make a reservation for HGX H100 compute on our website.

System Specifications​

Features​

Hyperfast compute plus the lowest available network latency for extremely fast training times​

FP8 support with Transformer Engine for quicker onboarding to H100​

Make a reservation for HGX H100​

System Specifications

Features

Hyperfast compute plus the lowest available network latency for extremely fast training times

FP8 support with Transformer Engine for quicker onboarding to H100

Make a reservation for HGX H100