Explore, Build, and Innovate AI Reasoning with NVIDIA’s Open Models and Recipes

Community Article Published June 4, 2025

Upvote

nvidia

nvidia

nvidia

Generative AI (GenAI) is revolutionizing diverse fields such as language understanding, vision-language integration, robotics, physical AI, and many others. Large language models (LLMs) are at the forefront, transforming applications like language understanding and translation. In robotics, GenAI empowers robots to comprehend and execute intricate instructions, navigate dynamic environments, and perform tasks with remarkable precision. Vision-language models enable machines to seamlessly process and interpret visual information alongside textual data, enhancing human-robot interactions and situational awareness. Physical AI leverages GenAI to create systems that learn from and adapt to their physical surroundings, driving advancements in autonomous driving, smart manufacturing, and healthcare robotics

NVIDIA is passionately committed to empowering the open-source community particularly on platforms like Hugging Face — to spearhead this wave of innovation. On Hugging Face, with over 370 models, including Llama Nemotron for building agents, Cosmos as a world foundation model for video generation, and GR00T N1 for human-robot reasoning, NVIDIA is leading the charge. Beyond models, NVIDIA has released more than 50 open-source datasets, providing the essential tools and resources for the next generation of AI development.

This blog explores NVIDIA's open reasoning models, their distillation recipes, and custom tuning datasets. Discover AI's cutting-edge advancements, preview innovations for GTC Paris, and learn how resources on Hugging Face can boost your AI journey.

Introducing the Llama Nemotron: A Family of Open Reasoning Models

We recently announced Llama Nemotron, a family of open reasoning-capable models designed for real-world performance and flexibility. These models come in a range of ‘t-shirt sizes’ — Ultra, Super, and Nano — so, whether you're targeting edge devices like NVIDIA Jetson, local setup with an NVIDIA RTX GPU, or large-scale deployments, there's a version that fits.

Llama Nemotron Ultra (253B parameters) delivers maximum accuracy for complex enterprise workflows, achieving 76% accuracy on the GPQA Diamond scientific reasoning benchmark—outperforming PhD-level experts who average 65%.
Llama Nemotron Super (49B parameters) balances accuracy and efficiency, optimized for a single NVIDIA H100 GPU deployment while maintaining leading performance.
Llama Nemotron Nano (4B parameters) brings reasoning to edge environments with highest efficiency and lowest latency for resource-constrained scenarios.

What makes Llama Nemotron models stand out is their dual capability of excelling in both reasoning-intensive and standard tasks. They deliver up to five times higher throughput compared to leading open-source reasoning models, thanks to advanced optimization techniques like pruning and distillation.

A unique feature is the ability to toggle reasoning on or off, allowing for efficient resource use. For instance, reasoning can be skipped for simple queries like “What’s the capital of France?” but activated for complex tasks such as planning itineraries with multiple constraints, ensuring both efficiency and precision.

The models shine in areas like scientific reasoning, code generation, and tool integration, making them indispensable for enterprise AI. Their ability to handle multi-step problems, produce structured answers, and interact with external systems positions them as ideal for building autonomous agents. Additionally, their instruction-following skills ensure accuracy and consistency in executing detailed directives, optimizing workflows across various domains.

Under the Hood: How We Developed the Nemotron Models

Creating these reasoning models required innovative approaches across multiple dimensions of model development.

Built on proven Llama 3.1 and 3.3 models, Nemotron development employed a sophisticated three-phase approach:

Stage 1: Foundational Model Optimization

Neural Architecture Search (NAS) tailored architectures for optimal performance on NVIDIA hardware.
Knowledge Distillation compressed larger models (e.g., 405B→253B, 70B→49B, 8B -> 4B) while preserving capabilities.
This resulted in hardware-optimized parameter counts for maximum performance. To learn more, see Puzzle: Distillation-Based NAS for Inference-Optimized LLMs.

Stage 2: Developing Reasoning Modes

For "Reasoning OFF" (general tasks), NVIDIA-curated synthetic datasets enhanced Chat, Math, Code, and Function Calling, using insights from Llama and Qwen2.5.
For "Reasoning ON," careful distillation with curated DeepSeek-R1 data targeted advanced Math, Code, and Science domains, with rigorous quality validation.
Both modes were trained simultaneously, switchable via the system prompt.

Stage 3: Fine-Tuning for Alignment and Interaction

The REINFORCE algorithm with heuristic verifiers bolstered instruction following and function calls.
Reinforcement Learning from Human Feedback (RLHF) using HelpSteer2 aligned models with natural conversational patterns.
The NVIDIA Llama 3.1 Nemotron Reward Model provided sophisticated reward signals.

Achieving Groundbreaking Performance

The Llama Nemotron family delivers exceptional results across industry-standard benchmarks:

GPQA Diamond: Llama Nemotron Ultra achieves 76% accuracy on this challenging scientific reasoning benchmark, surpassing the 65% average of PhD-level experts
LiveCodeBench: Demonstrating superior coding capabilities with robust performance on real-world programming tasks
AIME Mathematical Reasoning: Leading performance among open models on advanced mathematical problem-solving

Average of GPQA-Diamond, AIME2025, MATH500, BFCL, Arena Hard benchmarks, 1x H100. 250 Concurrent users. ISL/OSL: 500/2000.

These models achieve up to 5x faster inference speeds compared to other leading open reasoning models while delivering up to 20% accuracy improvements over base models

Fully Open and Accessible on Hugging Face

In line with our commitment to open innovation, we are making the entire Llama Nemotron ecosystem freely available to the Hugging Face community and beyond, including:

Complete Model Family: All three model variants (Nano, Super, Ultra) with full weights and configurations on Hugging Face
Training Datasets: Nearly 30 million high-quality samples including the OpenCodeReasoning and Llama-Nemotron-Post-Training datasets
Training Recipes: Detailed technical documentation and methodologies used in our post-training pipeline are available in this report.
NeMo Framework Integration: Full support for customization using NVIDIA's NeMo Framework for building domain-specific reasoning models.

Applications

The advanced reasoning capabilities of the Llama Nemotron models unlock a wide spectrum of applications across various industries, enabling more intelligent, autonomous, and efficient solutions:

Logistics and Supply Chain: Enhancing efficiency through sophisticated what-if scenario modeling, such as intelligent rerouting during disruptions and optimizing complex distribution networks.
Scientific Research: Accelerating discovery through automated hypothesis generation, multi-step experimental design, and complex data analysis workflows.
Healthcare: Improving diagnostic accuracy and treatment planning through systematic reasoning over patient data, medical literature, and clinical guidelines.
Financial Services: Powering advanced risk assessment, algorithmic trading strategies, and regulatory compliance automation.
Customer Support: Enabling autonomous resolution of complex customer issues through reasoning across knowledge bases, transaction histories, and service protocols.

Toolkit for the Next Generation of AI

The Llama Nemotron family represents a significant step forward in the journey towards truly intelligent AI and set a new standard for performance, efficiency, and flexibility. With the open weights, the datasets used to create them, and the powerful NVIDIA NeMo framework, developers have a complete toolkit to customize these models and build their own. We invite you to explore these resources on Hugging Face and start innovating today.

Join Us at GTC Paris

Ready to experience the future of AI reasoning? We're showcasing Llama Nemotron models at GTC Paris with live demos, hands-on labs, certifications, and deep-dive technical sessions. Connect with our engineering teams, explore real-world implementations, and discover how these models can transform your AI applications.

Community

dhirajpatra

Jun 9

Is it possible to fine tune with customdataset with single GPU local server?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote