NVIDIA Unveils “Industry Leading” Open-Source Llama-3.1-Nemotron-70B-Instruct LLM , Surpassing OpenAI’s GPT-4o In AI-Focused Benchmarks

Oct 18, 2024 at 09:25am EDT
NVIDIA Unveils "Industry Leading" Open-Source Llama-3.1-Nemotron-70B-Instruct LLM , Surpassing OpenAI's GPT-4o In AI-Focused Benchmarks 1

NVIDIA decided to drop one of the industry's biggest "Llama-3.1-Nemotron-70B-Instruct" LLM, surpassing OpenAI GPT-4o & Anthropic's Claude 3.5 Sonnet.

NVIDIA Is Looking To Dominate The AI Segment, Revealing Its Newest LLM, Targeted Towards Refining User Responses

Team Green is pushing up the gears when it comes to innovating the AI segment in ways deemed impossible, and after apparently dominating the "AI hardware" segment, the firm is now looking towards showing its magic in open-source LLM models, collaborating with Meta. The newest Llama-3.1-Nemotron-70B-Instruct LLM from NVIDIA hasn't seen much mainstream coverage yet, but based on the initial information available along with benchmarks, the new LLM from Team Green might turn out as industry-leading.

Related Story “I Produce The Lowest Cost Tokens In The World” Says NVIDIA CEO As He Highlights The Full-Stack Approach To AI

NVIDIA says that the Llama-3.1-Nemotron-70B-Instruct LLM is designed solely to make AI responses much more specific and aligned with human preference, especially in terms of factual correctness and coherent problem-solving. The model is said to be trained based on Meta's Llama-3.1-70B-Instruct Base, which is yet again a creation of Meta designed for 70 billion parameters. With NVIDIA's fine-tuning, the Llama-3.1-Nemotron-70B-Instruct specifically targets the "SteerLM Regression Reward Modelling."

The post is diving into a bit of technicality, but given a marvel of such kind, I mean, it does deserve it. So, the SteerLM Regression Reward Modelling involves defining a reward function that guides the LLM's learning process by using regression models to refine datasets to generate a clearer response. This makes data quality and model complexity much more refined, ultimately allowing NVIDIA to generate responses close to the user's requirements.

Interestingly, based on the Llama-3.1-Nemotron-70B-Instruct LLM model card present at HuggingFace, this particular model manages to solve the "strawberry" problem, which traditional AI models were unable to solve, where it involved counting the R's in the word. This isn't just the only achievement, as the upcoming details might surprise readers more. NVIDIA's Llama-3.1-Nemotron-70B-Instruct LLM has achieved leading ranking at numerous benchmarks, notably Arena Hard, an automatic evaluation tool for instruction-tuned LLMs, and here's how the overall scores stack up.

Model Arena Hard AlpacaEval MT-Bench Mean Response Length
Details (95% CI) 2 LC (SE) (GPT-4-Turbo) (# of Characters for MT-Bench)
Llama-3.1-Nemotron-70B-Instruct 85.0 (-1.5, 1.5) 57.6 (1.65) 8.98 2199.8
Llama-3.1-70B-Instruct 55.7 (-2.9, 2.7) 38.1 (0.90) 8.22 1728.6
Llama-3.1-405B-Instruct 69.3 (-2.4, 2.2) 39.3 (1.43) 8.49 1664.7
Claude-3-5-Sonnet-20240620 79.2 (-1.9, 1.7) 52.4 (1.47) 8.81 1619.9
GPT-4o-2024-05-13 79.3 (-2.1, 2.0) 57.5 (1.47) 8.74 1752.2

Don't get into the specific figures for now, but the critical element to note here is that the Llama-3.1-Nemotron-70B-Instruct has managed to surpass mainstream LLMs in the industry, such as OpenAI's GPT-4o, which is a significant milestone, given how big of an impact NVIDIA's fine-tuning has on the Llama-3.1-70B-Instruct Base. We haven't seen how the LLM performs in specific situations, such as complex coding tasks, or even inferencing-focused problems, but the initial benchmarks do reveal that NVIDIA's newest LLM is well-equipped.

Well, if you are eager to access the Llama-3.1-Nemotron-70B-Instruct LLM, you can either get it from NVIDIA's "NIM" platform here, or there is a compatible version available at HuggingFace, which you can check out here. Overall, Team Green is on its way to becoming dominant in the AI industry, conquering mainstream segments.

About the author: Muhammad Zuhair is a hardware and technology reporter for Wccftech, specializing in the semiconductor industry and the complex interplay between technology, manufacturing, and geopolitics. His coverage focuses on the corporate strategies and technological roadmaps of industry giants like TSMC, NVIDIA, Samsung, and Intel. Zuhair's expertise lies in deconstructing complex topics such as fabrication nodes (e.g., 2nm process), the economic impact of policies like the CHIPS Act, and the strategic development of AI infrastructure from NVIDIA, AMD and Intel.

Follow Wccftech on Google to get more of our news coverage in your feeds.