Announcement Hardware

Intel Announces Optimizations For Llama 3.1 To Boost Performance Across All Products: Gaudi, Xeon, Core & Arc Series

Sarfraz Khan • Jul 24, 2024 at 06:15am EDT

Meta's Llama 3.1 is now live & Intel has announced full support for Llama 3.1 AI models on its entire portfolio such as Gaudi, Xeon, Arc & Core.

All of Intel's CPUs & GPUs now feature enhanced performance with Llama 3.1 AI models

Meta just launched its newest large language model Llama 3.1 today, taking over the Llama version 3 released in April. With that, Intel released performance numbers of Llama 3.1 on its latest products, including the Intel Gaudi, Xeon, and AI PCs based on Core Ultra processors and Arc graphics. Intel is continuously working on its AI software ecosystem and the new Llama 3.1 models are enabled on its AI products available with various frameworks such as PyTorch and Intel Extension for PyTorch, DeepSpeed, Hugging Face Optimum Libraries, and vLLM, ensuring that users get enhanced performance on its data center, edge, and client AI products for the latest Meta LLMs.

Llama 3.1 consists of a multilingual LLMs collection, providing pre-trained and instruction-tuned generative models in different sizes. The largest foundation model introduced in Llama 3.1 is the 405B size, which offers state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation. The smaller models include the 70B and 8B sizes where the former is a highly performant yet cost-effective model and the latter is a light-weight model for ultra-fast response.

Intel tested the Llama 3.1 405B on its Intel Gaudi Accelerators, which are specially designed-processors for cost-effective and high-performance training and inference. The results show quick response and high throughput with different token lengths, displaying the capabilities of Gaudi 2 accelerators and Gaudi software. Similarly, the Gaudi 2 accelerators show even faster performance on the 70B model with 32k and 128k Token lengths.

Llama 3.1-8B Intel Xeon — Performance for Llama 3.1 8B on Intel Xeon Scalable Processor

Next, we have Intel 5th gen Xeon Scalable processors on the test bench, which show the performance with various token lengths. With 1K, 2K, and 8K token inputs, the token latency is in a close range(mostly under 40ms and 30ms) on both BF16 and WOQ INT8 tests. This shows the quick response of Intel Xeon processors, which possess the Intel AMX(Advanced Matrix Extensions) for superior AI performance. Even with 128K token inputs, the latency remains under 100ms on both tests.

llama-3-1-8b-on-intel-core-ultra-7165h-cropped

llama-3-1-8b-on-intel-arc-a770-16gb-limited-edition-cropped

The Llama 3.1 8B inference is quite quick on Intel Core Ultra processors as well when tested on the 8B-Instruct 4-bit Weights model. As tested on Core Ultra 7 165H with built-in Arc graphics, the token latency remains between 50ms and 60ms with 32, 256, 512, and 1024 tokens input. On a discrete Arc GPU like the Arc A770 16GB Limited Edition, the latency comes out to be extremely low, remaining around 15ms with all four different token input sizes.

About the author: Sarfraz Khan is a hardware reporter with a focus on PC components and the builder community. With years of experience writing about PC hardware and laptops, his work has been featured on several reputable technology publications. Sarfraz's hands-on experience is demonstrated through his first-person accounts of using and comparing different hardware configurations, providing practical and relatable insights for everyday users. His technical analysis is respected by peers in the enthusiast community and has been cited by specialized hardware sites such as Germany's Igor's Lab.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on Intel Announces Optimizations For Llama 3.1 To Boost Performance Across All Products: Gaudi, Xeon, Core & Arc Series

Intel Announces Optimizations For Llama 3.1 To Boost Performance Across All Products: Gaudi, Xeon, Core & Arc Series

All of Intel's CPUs & GPUs now feature enhanced performance with Llama 3.1 AI models

Trending Stories

ZeniMax Designer Morgan Goin Says Elder Scrolls Online Can’t Match Its Content Pace After Xbox Layoffs Gutted the Studio

PlayStation 6 Patent Scraps Liquid Metal Cooling After PS5 Leaks Fried APUs And Motherboards For Years

FromSoftware Finally Lifts The Veil Off The Duskbloods On August 21, As Network Test Registrations Open Soon

Intel Foundry Snags AMD, NVIDIA, and OpenAI as Design Wins on 18A & 14A Nodes While EMIB Achieves 98% Yields

Samsung Reportedly Outsources Google’s TPU I/O Late-Stage Design, Says Report

Popular Discussions

AMD Radeon Drivers Silently Add Multi Frame Generation “MFG 8x”, Ray Regeneration, and Neural Radiance Overrides, Hinting At A Bigger FSR Push

AMD Prepares For Zen 6 EPYC CPUs Launch For July 22nd-23rd, Confirms AMD’s Mark Papermaster

NVIDIA’s GeForce RTX 5070 Ti SUPER – Specs, Performance, And Price, Everything We Know So Far

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

NVIDIA RTX 50 Series Hotspot Temperature Readings Are Back Through HWMonitor Utility

Intel Announces Optimizations For Llama 3.1 To Boost Performance Across All Products: Gaudi, Xeon, Core & Arc Series

All of Intel's CPUs & GPUs now feature enhanced performance with Llama 3.1 AI models

Related Story Intel Foundry Snags AMD, NVIDIA, and OpenAI as Design Wins on 18A & 14A Nodes While EMIB Achieves 98% Yields

Further Reading

Trending Stories

Popular Discussions