About

Overview

Propelling the Data Center Into a New Era of Accelerated Computing

The NVIDIA HGX™ B200 propels the data center into a new era of accelerating computing and generative AI, integrating NVIDIA Blackwell GPUs with a high-speed interconnect to accelerate AI performance at scale. As a premier accelerated scale-up x86 platform with up to 15X faster real-time inference performance, 12X lower cost, and 12X less energy use, HGX B200 is designed for the most demanding AI, data analytics, and high-performance computing (HPC) workloads.

Real-Time Inference for the Next Generation of Large Language Models

HGX B200 achieves up to 15X higher inference performance over the previous NVIDIA Hopper™ generation for massive models such as GPT MoE 1.8T. The second-generation Transformer Engine uses custom Blackwell Tensor Core technology combined with TensorRT™-LLM and NVIDIA NeMo™ framework innovations to accelerate inference for LLMs and mixture-of-experts (MoE) models.

Next-Level Training Performance

The second-generation Transformer Engine, featuring FP8 and new precisions, enables a remarkable 3X faster training for large language models like GPT MoE 1.8T. This breakthrough is complemented by fifth-generation NVLink with 1.8TB/s of GPU-to-GPU interconnect, NVSwitch chip, InfiniBand networking, and NVIDIA Magnum IO software. Together, these ensure efficient scalability for enterprises and extensive GPU computing clusters.

Sustainable Computing

By adopting sustainable computing practices, data centers can lower their carbon footprints and energy consumption while improving their bottom line. The goal of sustainable computing can be realized with efficiency gains using accelerated computing with HGX. For LLM inference performance, HGX B200 improves energy efficiency by 12X and lowers costs by 12X compared to the Hopper generation.

Characteristics
Blackwell GPUs | Grace CPUs 8 | 0
CPU Cores -
Total FP4 Tensor Core 144 PFLOPS
Total FP8/FP6 Tensor Core 72 PFLOPS
Total Fast Memory Up to 1.4TB
Total Memory Bandwidth Up to 62TB/s
Total NVLink Bandwidth 14.4TB/s
FP4 Tensor Core 18 PFLOPS
FP8/FP6 Tensor Core 9 PFLOPS
INT8 Tensor Core 9 POPS
FP16/BF16 Tensor Core 4.5 PFLOPS
TF32 Tensor Core 2.2 PFLOPS
FP32 75 TFLOPS
FP64/FP64 Tensor Core 37 TFLOPS
GPU Memory | Bandwidth 180GB HBM3E | 7.7TB/s
Multi-Instance GPU (MIG) 7
Decompression Engine Yes
Decoders 7 NVDEC, 7 nvJPEG
Max Thermal Design Power (TDP) Configurable up to 1,000W
Interconnect 5th Generation NVLink: 1.8TB/s, PCIe Gen5: 128GB/s
Server Options NVIDIA HGX B200 partner and NVIDIA- Certified Systems with 8 GPUs
Overview

Propelling the Data Center Into a New Era of Accelerated Computing

The NVIDIA HGX™ B200 propels the data center into a new era of accelerating computing and generative AI, integrating NVIDIA Blackwell GPUs with a high-speed interconnect to accelerate AI performance at scale. As a premier accelerated scale-up x86 platform with up to 15X faster real-time inference performance, 12X lower cost, and 12X less energy use, HGX B200 is designed for the most demanding AI, data analytics, and high-performance computing (HPC) workloads.

Real-Time Inference for the Next Generation of Large Language Models

HGX B200 achieves up to 15X higher inference performance over the previous NVIDIA Hopper™ generation for massive models such as GPT MoE 1.8T. The second-generation Transformer Engine uses custom Blackwell Tensor Core technology combined with TensorRT™-LLM and NVIDIA NeMo™ framework innovations to accelerate inference for LLMs and mixture-of-experts (MoE) models.

Next-Level Training Performance

The second-generation Transformer Engine, featuring FP8 and new precisions, enables a remarkable 3X faster training for large language models like GPT MoE 1.8T. This breakthrough is complemented by fifth-generation NVLink with 1.8TB/s of GPU-to-GPU interconnect, NVSwitch chip, InfiniBand networking, and NVIDIA Magnum IO software. Together, these ensure efficient scalability for enterprises and extensive GPU computing clusters.

Sustainable Computing

By adopting sustainable computing practices, data centers can lower their carbon footprints and energy consumption while improving their bottom line. The goal of sustainable computing can be realized with efficiency gains using accelerated computing with HGX. For LLM inference performance, HGX B200 improves energy efficiency by 12X and lowers costs by 12X compared to the Hopper generation.

Characteristics
Blackwell GPUs | Grace CPUs 8 | 0
CPU Cores -
Total FP4 Tensor Core 144 PFLOPS
Total FP8/FP6 Tensor Core 72 PFLOPS
Total Fast Memory Up to 1.4TB
Total Memory Bandwidth Up to 62TB/s
Total NVLink Bandwidth 14.4TB/s
FP4 Tensor Core 18 PFLOPS
FP8/FP6 Tensor Core 9 PFLOPS
INT8 Tensor Core 9 POPS
FP16/BF16 Tensor Core 4.5 PFLOPS
TF32 Tensor Core 2.2 PFLOPS
FP32 75 TFLOPS
FP64/FP64 Tensor Core 37 TFLOPS
GPU Memory | Bandwidth 180GB HBM3E | 7.7TB/s
Multi-Instance GPU (MIG) 7
Decompression Engine Yes
Decoders 7 NVDEC, 7 nvJPEG
Max Thermal Design Power (TDP) Configurable up to 1,000W
Interconnect 5th Generation NVLink: 1.8TB/s, PCIe Gen5: 128GB/s
Server Options NVIDIA HGX B200 partner and NVIDIA- Certified Systems with 8 GPUs
Up to $50 OFF your first order
Join SMTTR — built by engineers
NVIDIA by NVIDIA

NVIDIA HGX B200 Tensor Core GPU

Regular price
$399,950.00
Sale price
$399,950.00
Regular price
$412,319.59
Shipping calculated at checkout.
GPU Memory:
In Stock Now
Upgrade your tech collection with the latest must-have item, available now in limited quantities.
Shipping & Fulfillment
All products ship from verified U.S. distributors or SMTTR testing facility
Returns & Refunds
Returns accepted within 30 days
Warranty & Support
All products covered under manufacturer warranty (NVIDIA, AMD, etc.); SMTTR provides first-layer support and will coordinate RMA if needed.

End of Summer Sale - Up to 15% OFF

Check Out These Related Products