OpenAI’s GPT-OSS Models Now Optimized for NVIDIA RTX GPUs




Timothy Morano
Aug 05, 2025 22:58

OpenAI, in collaboration with NVIDIA, has optimized its open-source GPT-OSS models for NVIDIA RTX GPUs, enhancing local AI model performance and accessibility for developers.





OpenAI has collaborated with NVIDIA to optimize its new open-source GPT-OSS models for NVIDIA’s GeForce RTX and RTX PRO GPUs, significantly enhancing performance and accessibility for AI developers and enthusiasts. These models, gpt-oss-20b and gpt-oss-120b, are designed for local usage and testing, enabling advanced AI applications on personal computers and workstations, according to NVIDIA’s blog.

Enhanced Performance with NVIDIA RTX

The optimized models utilize NVIDIA’s RTX AI PCs and workstations, offering performance up to 256 tokens per second on the GeForce RTX 5090 GPU. This collaboration allows for smart, fast inference from cloud to PC, enabling applications such as web searches and in-depth research using reasoning models.

NVIDIA CEO Jensen Huang highlighted the models’ potential, stating, “The gpt-oss models let developers everywhere build on that state-of-the-art open-source foundation, strengthening U.S. technology leadership in AI.” This release underscores NVIDIA’s dominance in AI, from training to inference, across various platforms.

Open-Source Flexibility and Innovation

The gpt-oss models are flexible, open-weight reasoning models that feature chain-of-thought capabilities and adjustable reasoning effort levels. Trained on NVIDIA H100 GPUs, these models support complex tasks like coding assistance and document comprehension. With context length support up to 131,072, they are among the longest available for local inference, ideal for extensive research tasks.

These models are the first MXFP4 models available on NVIDIA RTX, offering high model quality with efficient resource usage. NVIDIA’s continued collaboration with the open-source community, including projects like llama.cpp and the GGML tensor library, ensures optimized performance on RTX GPUs.

Accessible Tools for Developers

For ease of use, developers can leverage the Ollama app, which is optimized for RTX GPUs with at least 24GB of VRAM. This app provides a seamless interface to interact with the models, requiring no additional configuration for optimal performance. Additional features like PDF support in chats and multimodal support enhance user experience.

Developers have further options through Microsoft AI Foundry Local, a solution for on-device AI inferencing currently in public preview. This tool integrates smoothly with existing workflows and supports high-efficiency AI model deployment on Windows platforms.

The release of these models marks a significant advancement in AI technology, providing developers with powerful tools to innovate and enhance AI-accelerated applications. NVIDIA continues to support the AI community through collaborative projects and technology leadership.

Image source: Shutterstock




#OpenAIs #GPTOSS #Models #Optimized #NVIDIA #RTX #GPUs

Leave a Reply

Your email address will not be published. Required fields are marked *