NVIDIA HGX H100
Leverage the most powerful supercomputing platform on CoreWeave Cloud
CoreWeave's infrastructure is purpose-built for large-scale, GPU-accelerated workloads. We specialize in serving the most demanding AI and machine learning applications. To this end, CoreWeave is proud to be one of the only Cloud platforms in the world offering NVIDIA's most powerful end-to-end AI supercomputing platform.
The NVIDIA HGX H100 enables up to seven times more efficient high-performance computing (HPC) applications, up to nine times faster AI training on large models, and up to thirty times faster AI inference than the NVIDIA HGX A100.
Learn more about how MosaicML with CoreWeave is making NVIDIA's most powerful supercomputer more accessible.
System Specifications
The table below presents the specifications of both NVIDIA HGX H100 GPU models at peak performance:
HGX H-100 4-GPU | HGX H-100 8-GPU | |
---|---|---|
FP64 | 134 TFLOPS | 268 TFLOPS |
FP64 Tensor Core | 268 TFLOPS | 535 TFLOPS |
FP32 | 268 TFLOPS | 535 TFLOPS |
TF32 Tensor Core | 3958 TFLOPS* | 7,915 TFLOPS* |
FP16 Tensor Core | 7915 TFLOPS* | 15,830 TFLOPS* |
FP8 Tensor Core | 15,830 TFLOPS* | 31,662 TFLOPS* |
INT8 Tensor Core | 15,830 TOPS* | 31,662 TOPS* |
GPU Memory | 320 GB | 640GB |
Aggregate GPU Memory Bandwidth | 13TB/s | 27TB/s |
Maximum Number of MIG Instances | 28 | 56 |
NVIDIA NVLink | Fourth-generation NVLink 900GB/s | Fourth-generation NVLink 900GB/s |
NVIDIA NVSwitch | N/A | Third-generation NVSwitch |
NVSwitch GPU-GPU bandwidth | N/A | 900GB/s |
In-network compute | N/A | 3.6 TFLOPS |
Total aggregate network bandwidth | 3.6TB/s | 7.2TB/s |
* with sparsity.
The 8-GPU model provides significantly higher computational power and is better suited for highly demanding tasks that require intense GPU-GPU communication. It's ideal for large-scale AI training and for applications that involve massive data volumes.
The 4-GPU model, while still highly capable, targets slightly less intensive and exascale computing tasks. It focuses on maximizing GPU density while minimizing required space and power.
Features
Hyperfast compute plus the lowest available network latency for extremely fast training times
The intense speeds of the HGX H100, combined with the lowest NVIDIA GPUDirect network latency on the market - the NVIDIA Quantum-2 InfiniBand platform - reduces the training time of AI models to "days or hours, instead of months."
With AI permeating nearly every industry today, this speed and efficiency has never been more vital for HPC applications.
FP8 support with Transformer Engine for quicker onboarding to H100
The open source Python library Transformer Engine by NVIDIA enables the use of the FP8 (8-bit floating point) format on Hopper GPUs, the card architecture utilized by HGX H100s by providing .
Although all major deep learning frameworks support the FP16 format, FP8 support is not natively available in many frameworks today, a problem that Transformer Engine addresses:
[...] With Hopper GPU architecture, FP8 precision was introduced, which offers improved performance over FP16 with no degradation in accuracy. Although all major deep learning frameworks support FP16, FP8 support is not available natively in frameworks today.
TE addresses the problem of FP8 support by providing APIs that integrate with popular Large Language Model (LLM) libraries. It provides a Python API consisting of modules to easily build a Transformer layer as well as a framework agnostic library in C++ including structs and kernels needed for FP8 support. Modules provided by TE internally maintain scaling factors and other values needed for FP8 training, greatly simplifying mixed precision training for users.
Tutorials on using Faster Transformer on CoreWeave are forthcoming.
Learn more from NVIDIA about FP8 and why it matters in ML and AI applications.
Read more on MosaicML about how HGX H100s on CoreWeave accelerate training operations while preserving model quality.
Make a reservation for HGX H100
Due to high demand, A100 NVLINK (HGX) and H100 NVLINK (HGX) nodes are currently fully committed on client contracts, and are therefore not available for on-demand use. We recommend a conversation with the CoreWeave team to build a strategic plan catered to your needs to make use of available infrastructure and to plan for your future capacity requirements. Contact CoreWeave Sales to get started.
If your needs demand the highest performance in supercomputing coupled with the lowest-latency networking available, make a reservation for HGX H100 compute on our website.