NVIDIA Boosts AI Performance with GB200 NVL72 and OpenAI gpt-oss Models




Zach Anderson
Aug 05, 2025 23:50

NVIDIA collaborates with OpenAI to enhance AI capabilities, achieving up to 1.5 million TPS with their GB200 NVL72 system, optimizing gpt-oss models.





NVIDIA, in collaboration with OpenAI, has announced significant advancements in AI performance, leveraging the power of the NVIDIA GB200 NVL72 system. The recent launch of the OpenAI gpt-oss-20b and gpt-oss-120b models promises to deliver up to 1.5 million tokens per second (TPS), marking a substantial leap in AI processing capabilities, according to NVIDIA.

Enhanced AI Capabilities

The gpt-oss models, known for their text-reasoning capabilities, are built using the mixture of experts (MoE) architecture with SwigGLU activations. These models utilize RoPE for attention layers, supporting a 128k context length, and are optimized for NVIDIA’s Blackwell architecture. They are released in FP4 precision, compatible with an 80 GB data center GPU, and optimized for NVIDIA’s advanced hardware.

Collaborative Developments

NVIDIA’s collaboration with OpenAI extends to various open-source frameworks, including Hugging Face Transformers and NVIDIA TensorRT-LLM, to enhance model performance and developer accessibility. The gpt-oss-120b model, in particular, required extensive training, amounting to over 2.1 million GPU hours.

Technical Specifications

The gpt-oss-20b and gpt-oss-120b models feature a range of specifications to cater to diverse AI needs. These include varying transformer block counts, total parameters, and expert configurations, designed to optimize inference performance on NVIDIA’s platforms.

Deployment Options

NVIDIA offers multiple deployment options for developers, including the use of vLLM and TensorRT-LLM for server setup and performance optimization. The GB200 NVL72 system is designed to handle high throughput, accommodating up to 50,000 concurrent users efficiently.

Future Prospects

With the introduction of these advanced models, NVIDIA aims to support a broad spectrum of AI applications from cloud to edge. Their efforts to integrate gpt-oss models across various platforms highlight a commitment to enhancing AI infrastructure and developer experience.

For more details on the deployment and capabilities of these models, visit the NVIDIA blog.

Image source: Shutterstock




#NVIDIA #Boosts #Performance #GB200 #NVL72 #OpenAI #gptoss #Models

Leave a Reply

Your email address will not be published. Required fields are marked *