NVIDIA's Nemotron class of open-source LLMs just got significantly enhanced with the latest release, Nemotron 3 Super, which now targets agentic AI workloads with its extensive context window.
NVIDIA's Nemotron 3 Super Leverages Mamba-MoE, With an Extensive 1-Million Token Context Window
For those unaware, when we talk about the leading contributors to the world of open-source AI models, some might think of Chinese AI labs like Kimi or Qwen, but in reality, NVIDIA's Nemotron suite leads in this way. As AI is distributed across a "five-layer" cake, NVIDIA has not only dominated infrastructure and chips but is also one of the few in the West to have heavily invested in open-source models. With that, NVIDIA has now unveiled the Nemotron 3 Super, with the main idea being to run agentic AI applications at scale, making it ideal for agents like OpenClaw.
One of the standout aspects of Nemotron 3 Super is NVIDIA's hybrid Mamba-MoE architecture. Compared to traditional MoE models, Mamba is a really impressive implementation. Essentially, NVIDIA has changed how an LLM interprets the data flow. With the newer architecture, Mamba relies on the State Space Model (SSM) to read data linearly, preventing a large context window from being built up and including irrelevant information. Mamba-MoE allows Nemotron 3 Super to maintain an optimal context window for user workloads, yielding the best agentic responses.
- Hybrid Architecture: Mamba layers deliver 4x higher memory and compute efficiency, while transformer layers drive advanced reasoning.
- MoE: Only 12 billion of its 120 billion parameters are active at inference.
- Latent MoE: A new technique that improves accuracy by activating four expert specialists for the cost of one to generate the next token at inference.
- Multi-Token Prediction: Predicts multiple future words simultaneously, resulting in 3x faster inference.
- NVIDIA
The Mamba layers deliver 4x higher memory efficiency and advanced reasoning, making Nemotron 3 Super ideal for inference workloads. Another impressive feature of Nemotron 3 Super is a 1-million-token context window, which is 4 times the size of the one in Kimi 2.5. There's a common law within agentic systems: the bigger the window, the better the response. This is why, from this aspect alone, Nemotron 3 Super dominates all other open-source LLMs and even comes close to the likes of Opus 4.5, despite being limited to just 120 billion parameters.
Speaking of OpenClaw, NVIDIA tested Nemotron 3 Super on PinchBench, a suite used to evaluate agent workloads, and the model scored 85.6% across the full test suite, surpassing Opus 4.5, Kimi 2.5, and GPT-OSS 120b. For consumers running extensive workloads through OpenClaw, Nemotron 3 Super has opened up an entirely new class of performance, with compute power requirements that can be met with just a single GPU.
Nemotron 3 Super is just an example of how extensive agentic AI systems would actually become moving ahead, and interestingly, LLMs are now overcoming compute limitations as well, which is why the future of model deployment on edge is brighter than ever.
Follow Wccftech on Google to get more of our news coverage in your feeds.
