Building the next frontier of agentic AI.
Overview
NVIDIA Vera Rubin NVL72 unifies leading-edge technologies from NVIDIA—72 Rubin GPUs, 36 Vera CPUs, NVIDIA ConnectX™-9 SuperNICs, and BlueField™-4 DPUs. It scales up intelligence in a rack-scale platform with the NVIDIA NVLink™ 6 switch and scales out with NVIDIA Quantum-X800 InfiniBand and Spectrum-X™ Ethernet to power the AI industrial revolution at scale. When deployed with NVIDIA Groq 3 LPX racks, Vera Rubin NVL72 delivers a new class of inference performance for trillion-parameter models and million-token contexts.
Vera Rubin NVL72 is built on the third-generation NVIDIA MGX™ NVL72 rack design, offering a seamless transition from prior generations. It delivers AI training with one-fourth the GPUs and AI inference at one-tenth the cost per million tokens versus NVIDIA Blackwell. Featuring cable‑free modular tray designs and support from over 80 MGX ecosystem partners, the rack-scale AI supercomputer delivers world‑class performance with rapid deployment.
Performance
LLM inference performance subject to change. Cost per 1 million tokens based on Kimi-K2-Thinking model using 32K/8K ISL/OSL comparing NVIDIA GB200 NVL72 and NVIDIA Vera Rubin NVL72.
NVIDIA Vera Rubin NVL72 delivers one-tenth the cost per million tokens compared to NVIDIA GB200 NVL72 for highly interactive, deep reasoning agentic AI.
NVIDIA Vera Rubin NVL72 delivers up to 10x more tokens per megawatt than NVIDIA GB200 NVL72, scaling intelligence within the same power footprint.
LLM inference performance subject to change. Tokens per second per MW based on Kimi-K2 Thinking model using 32K/8K ISL/OSL comparing NVIDIA GB200 NVL72 and NVIDIA Vera Rubin NVL72.
Projected performance subject to change. Number of GPUs based on a 10T MoE model trained on 100T tokens in a fixed timeframe of 1 month comparing NVIDIA GB200 NVL72 and NVIDIA Vera Rubin NVL72.
NVIDIA Vera Rubin NVL72 trains mixture-of-experts (MoE) models with one-fourth the number of GPUs compared to NVIDIA GB200 NVL72.
Agentic systems consume up to 15x more tokens than traditional AI applications. AI factories must deliver on token volume and massive context windows with low latency and efficient economics. When paired with LPX, Vera Rubin NVL72 delivers up to 35x higher throughput per megawatt for trillion-parameter models.
Projected performance subject to change. Free Tier ($0): Qwen-3 235-billion parameter model with 32K KV-cached tokens. Medium Tier ($3): Kimi K2.5 1-trillion parameter model with 128K KV-cached tokens. High Tier ($6): GPT-MoE 2-trillion parameter model with 128K KV-cached tokens. Premium ($45) and Ultra ($150) Tiers: GPT-MoE 2-trillion-parameter model with 400K KV-cached tokens.
Powering the Era of AI Agents
The Vera Rubin platform opens the next frontier of agentic AI with five racks to scale the world’s AI factories—NVIDIA Vera Rubin NVL72, NVIDIA Vera CPU, NVIDIA Groq 3 LPX, NVIDIA Vera BlueField-4 STX, and NVIDIA Spectrum-6 SPX Ethernet. Designed to operate together as one incredible AI supercomputer, the racks power every phase of AI—from massive-scale pretraining, post-training and test-time scaling to real-time agentic inference.
NVIDIA Vera Rubin NVL4 delivers revolutionary performance through four NVIDIA Rubin GPUs interconnected by a second-generation NVLink bridge running sixth-generation NVIDIA NVLink, paired with two NVIDIA Vera CPUs over NVLink-C2C. Compatible with liquid-cooled NVIDIA MGX™ modular servers, it delivers up to 4x the performance for scientific computing simulation, 6x for AI-for-Science training, and 8x for AI-for-Science inference versus Grace Hopper.
Specifications¹
| NVIDIA Vera Rubin NVL72 | NVIDIA Vera Rubin Superchip | NVIDIA Rubin GPU | |
|---|---|---|---|
| Configuration | 72 NVIDIA Rubin GPUs | 36 NVIDIA Vera CPUs | 2 NVIDIA Rubin GPUs | 1 NVIDIA Vera CPU | 1 NVIDIA Rubin GPU |
| NVFP4 Inference | 3,600 PFLOPS | 100 PFLOPS | 50 PFLOPS |
| NVFP4 Training² | 2,520 PFLOPS | 70 PFLOPS | 35 PFLOPS |
| FP8/FP6 Training² | 1,260 PFLOPS | 35 PFLOPS | 17.5 PFLOPS |
| INT8² | 18 POPS | 500 TOPS | 250 TOPS |
| FP16/BF16² | 288 PFLOPS | 8 PFLOPS | 4 PFLOPS |
| TF32² | 144 PFLOPS | 4 PFLOPS | 2 PFLOPS |
| FP32 | 9,360 TFLOPS | 260 TFLOPS | 130 TFLOPS |
| FP64 | 2,400 TFLOPS | 67 TFLOPS | 33 TFLOPS |
| FP32 SGEMM³ | 28,800 TFLOPS | 800 TFLOPS | 400 TFLOPS |
| FP64 DGEMM³ | 14,400 TFLOPS | 400 TFLOPS | 200 TFLOPS |
| GPU Memory | Bandwidth | 20.7 TB HBM4 | 1,580 TB/s | 576 GB HBM4 | 44 TB/s | 288 GB HBM4 | 22 TB/s |
| NVIDIA NVLink | Sixth Generation | ||
| NVLink Bandwidth | 260 TB/s (NVLink 6 Switch Bandwidth) |
7.2 TB/s | 3.6 TB/s |
| NVLink-C2C Bandwidth | 65 TB/s | 1.8 TB/s | - |
| CPU Core Count | 3,168 custom NVIDIA Olympus cores (Arm® compatible) | 88 custom NVIDIA Olympus cores (Arm® compatible) | - |
| CPU Memory | 54 TB LPDDR5X | 1.5 TB LPDDR5X | - |
| Networking Bandwidth (Scale Out) | 28.8 TB/s | 0.8 TB/s | 0.4 TB/s |
| Total NVIDIA + HBM4 Chips | 1,296 | 30 | 12 |
1. Preliminary information. All values are up to and subject to change.
2. Dense specification.
3. Peak performance using Tensor Core-based emulation algorithms.
Get Started
Sign up for the latest news, updates, and more from NVIDIA.