Intel Announces General Availability of Gaudi 3 AI Accelerators In Q4: The Cost-Effective AI Solution

Hassan Mujtaba • Sep 24, 2024 at 10:59am EDT

Intel today announced the general availability of its latest Gaudi 3 AI Accelerators which will start shipping next month.

Intel Gaudi 3 Is Heading To The AI Accelerator Segment As Early As October, Delivering Better Value Than The Competition

Intel's Gaudi lineup is well regarded in the AI industry due to its cost-effective positioning and the next iteration of Gaudi products will be available as early as next month with the Gaudi 3. Today, Intel is announcing the full product stack of Gaudi 3 products which includes the Accelerator cards (HL-325L OAM-Compliant), the Universal Baseboard (HLB-325), and the PCIe CEM (HL-388 Add-In-Card).

The Intel Gaudi 3 PCIe CEM is being detailed in today's announcement and will bring up to 1835 TFLOPS of FP8 (peak) compute capabilities along with 128 GB of HBM2e memory, a 600W TDP, 8 matrix multiplication engines, 64 TPCs 22 200 GbE RDMA NICs, all in a dual-slot full-height 10.5" solution. The OAM solution will be equipped with 96 MB of SRAM in two 48 MB SRAM stacks with a total HBM bandwidth of 3.67 TB/s and a total on-die SRAM bandwidth (L2) of 19.2 TB/s.

Each Matrix Multiplication engine is fully configurable (not programmable) and comes with a 256 x 256 MAC array structure with FP32 accumulators and 64K MACs/cycle for BF16 and FP8. The TPC or Tensor Processing Core features a 256B-wide SIMD vector processor which is programmable with C enhanced (TPC intrinsic), a VLIW with 4 separate pipeline slots, an integrated address generation unit & supports main 1/2/4-Byte datatypes (Floating Point and Integer).

The universal baseboard will be equipped with four Gaudi 3 AI accelerators which will feature 4 200 GbE interconnect links and 400 GbE through the QSFP-DD controller. Each OAM solution will have an x16 PCIe Gen5 link, offering up to 800 GB/s for scale-out and 1800 GB/s for scale-up bandwidth. The system itself will pack 512 GB/s of PCIe bandwidth. This solution is ideally designed for inferencing, fine-tuning, and small model training.

In terms of performance, the Intel Gaudi 3 AI accelerator will offer up to 9% better inference uplift in LLaMA 3 8B models while delivering 80% better performance per $ versus the H100. In LLaMA 70B, the Gaudi 3 AI accelerator will offer 19% better inference throughput and 2x performance per $ versus the H100.

The Intel Gaudi 3 reference server (HLS-3) node will come with 2 Intel Xeon Host CPUs such as the latest Xeon 6900P series and feature 8 OAM cards, offering a total bandwidth of 67.2 Tb/s (scale-up) and 9.6 Tb/s (scale-out). The AI solution will be backed by the Gaudi software suite which is the most commonly used Gen AI framework and supports FP16, BF16, and FP8 Quantization. Intel is working with various partners on the Gaudi ecosystem which include Dell Technologies, and Supermicro as the system providers, and IBM, LUMEN, Infosys, Naver, and many others as the SW enablers.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on Intel Announces General Availability of Gaudi 3 AI Accelerators In Q4: The Cost-Effective AI Solution

Intel Announces General Availability of Gaudi 3 AI Accelerators In Q4: The Cost-Effective AI Solution

Intel Gaudi 3 Is Heading To The AI Accelerator Segment As Early As October, Delivering Better Value Than The Competition

Trending Stories

SK hynix May Add Just One-Sixth Of Its Planned New Memory Capacity By 2028, Handing Ammunition To The DRAM Price-Fixing Lawsuit

Samsung Gen 5.0 1 TB And 2 TB 9100 PRO SSDs Are Now Retailing For The Same Price As Gen 4.0 990 PRO SSD Variants

AMD Radeon Drivers Silently Add Multi Frame Generation “MFG 8x”, Ray Regeneration, and Neural Radiance Overrides, Hinting At A Bigger FSR Push

Battlestar Galactica: Scattered Hopes Review – Sometimes, You Have to Roll a Hard six

CAPCOM Reportedly Plans to Create Bigger Expansions Starting With Resident Evil Requiem, As It Prepares Veronica Q1 2027 Release

Popular Discussions

AMD Radeon Drivers Silently Add Multi Frame Generation “MFG 8x”, Ray Regeneration, and Neural Radiance Overrides, Hinting At A Bigger FSR Push

AMD Prepares For Zen 6 EPYC CPUs Launch For July 22nd-23rd, Confirms AMD’s Mark Papermaster

NVIDIA’s GeForce RTX 5070 Ti SUPER – Specs, Performance, And Price, Everything We Know So Far

AMD’s Next-Gen Medusa Point “10-Core” CPU Beats Strix “10-Core” By 29% In Single-Core & 22% In Multi-Core While Running At Just 2.0 GHz

AMD Ryzen Becomes The Top CPU Choice While Radeon Powers 1 In Every 3 Desktop Gaming GPUs Sold at Microcenter

Intel Announces General Availability of Gaudi 3 AI Accelerators In Q4: The Cost-Effective AI Solution

Intel Gaudi 3 Is Heading To The AI Accelerator Segment As Early As October, Delivering Better Value Than The Competition

Related Story An iOS Developer Vibe-Coded A “Capybara Food Delivery” Game Using Claude Code, 27,000 Lines Of Programming Made Entirely By AI, And Won $25,000 In Prize Money

Further Reading

Trending Stories

Popular Discussions