NVIDIA Blackwell Ultra Surpasses MLPerf Inference Records




Peter Zhang
Sep 09, 2025 16:44

NVIDIA’s Blackwell Ultra architecture sets new benchmarks in AI inference performance with its debut in MLPerf Inference v5.1, showcasing significant advancements in LLM computation.





NVIDIA’s latest technological advancement, the Blackwell Ultra architecture, has made a significant impact in the field of artificial intelligence by setting new records in AI inference performance, according to the official NVIDIA blog. The debut of Blackwell Ultra in the MLPerf Inference v5.1 benchmark, a leading industry standard for AI performance, highlighted its superior capabilities in handling large language models (LLMs).

Benchmark Achievements

The MLPerf Inference v5.1 benchmark includes a variety of tests to gauge AI inference performance, with NVIDIA’s Blackwell Ultra setting new records across several newly introduced models. These include DeepSeek-R1, a 671-billion parameter mixture-of-experts model, and the Llama 3.1 series models. The Blackwell Ultra platform demonstrated exceptional performance, surpassing all previous benchmarks and maintaining per-GPU performance records.

Notably, the architecture delivered up to 1.5 times higher peak NVFP4 AI compute and doubled the attention-layer compute capabilities compared to its predecessors. The introduction of higher HBM3e capacity also contributed to these advancements.

Technological Innovations

The Blackwell Ultra architecture incorporates several innovative technologies that enhance its performance. Extensive use of NVFP4 acceleration across all DeepSeek-R1 and Llama model submissions played a crucial role in achieving these results. Additionally, the architecture’s ability to optimize key-value caches using FP8 precision significantly reduced memory footprint and improved performance.

New parallelism techniques, such as expert parallelism for the MoE portion and data parallelism for the attention mechanism, were employed to maximize multi-GPU execution. These techniques were complemented by the use of CUDA Graphs to reduce CPU overhead during inference processes.

Implications for AI Inference

The results from the MLPerf Inference v5.1 benchmark underscore NVIDIA’s continued leadership in AI inference performance. The Blackwell Ultra architecture not only enhances throughput and efficiency but also reduces the cost per token significantly. This is particularly evident in the comparison with Hopper-based systems, where Blackwell Ultra delivered approximately five times higher throughput per GPU.

The introduction of disaggregated serving techniques further highlights NVIDIA’s innovation in AI infrastructure. By decoupling context and generation across separate GPUs or nodes, NVIDIA has optimized resource use, particularly for large language models like Llama 3.1 405B.

Future Prospects

NVIDIA’s advancements in AI inference technology continue to set new standards in the industry. The Blackwell Ultra architecture, with its record-breaking performance, positions NVIDIA at the forefront of AI innovation. As the demand for more sophisticated AI models grows, NVIDIA’s commitment to expanding its technological capabilities remains evident.

The introduction of Rubin CPX, a processor designed to accelerate long context processing, further exemplifies NVIDIA’s dedication to pushing the boundaries of AI efficiency and performance.

Image source: Shutterstock




#NVIDIA #Blackwell #Ultra #Surpasses #MLPerf #Inference #Records

Leave a Reply

Your email address will not be published. Required fields are marked *