James Ding
Jun 06, 2025 10:02
NVIDIA introduces the Nemotron-H Reasoning model family, delivering significant throughput gains and versatile applications in reasoning-intensive tasks, according to NVIDIA’s blog.
In a significant development for artificial intelligence, NVIDIA has announced the Nemotron-H Reasoning model family, designed to enhance throughput without compromising performance. These models are tailored to handle reasoning-intensive tasks, with a particular focus on math and science, where output lengths have been expanding significantly, sometimes reaching tens of thousands of tokens.
Breakthrough in AI Reasoning Models
NVIDIA’s latest offering includes the Nemotron-H-47B-Reasoning-128K and Nemotron-H-8B-Reasoning-128K models, both available in FP8 quantized variants. These models are derived from the Nemotron-H-47B-Base-8K and Nemotron-H-8B-Base-8K foundation models, according to NVIDIA’s blog.
The Nemotron-H-47B-Reasoning model, the most capable in this family, delivers nearly four times greater throughput than comparable transformer models such as the Llama-Nemotron Super 49B V1.0. It supports 128K token contexts and excels in accuracy for reasoning-heavy tasks. Similarly, the Nemotron-H-8B-Reasoning-128K model shows significant improvements over the Llama-Nemotron Nano 8B V1.0.
Innovative Features and Licensing
The Nemotron-H models introduce a flexible operational feature, allowing users to choose between reasoning and non-reasoning modes. This adaptability makes it suitable for a wide range of real-world applications. NVIDIA has released these models under an open research license, encouraging the research community to explore and innovate further.
Training and Performance
The training of these models involved supervised fine-tuning (SFT) with examples that included explicit reasoning traces. This comprehensive training approach, which spanned over 30,000 steps for math, science, and coding, has resulted in consistent improvements on internal STEM benchmarks. A subsequent training phase focused on instruction following, safety alignment, and dialogue, further enhancing the model’s performance across diverse tasks.
Long Context Handling and Reinforcement Learning
To support 128K-token contexts, the models were trained using synthetic sequences up to 256K tokens, which improved their long-context attention capabilities. Additionally, reinforcement learning with Group Relative Policy Optimization (GRPO) was applied to refine skills such as instruction following and tool use, enhancing the model’s overall response quality.
Final Results and Throughput Comparisons
Benchmarking against models like Llama-Nemotron Super 49B V1.0 and Qwen3 32B, the Nemotron-H-47B-Reasoning-128K model demonstrated superior accuracy and throughput. Notably, it achieved approximately four times higher throughput than traditional transformer-based models, marking a significant advancement in AI model efficiency.
Overall, the Nemotron-H Reasoning models represent a versatile and high-performing foundation for applications requiring precision and speed, offering significant advancements in AI reasoning capabilities.
For more detailed information, please refer to the official announcement on the NVIDIA blog.
Image source: Shutterstock
#NVIDIA #Unveils #NemotronH #Reasoning #Models #Enhanced #Throughput