Intel today announced the general availability of its latest Gaudi 3 AI Accelerators which will start shipping next month.
Intel Gaudi 3 Is Heading To The AI Accelerator Segment As Early As October, Delivering Better Value Than The Competition
Intel's Gaudi lineup is well regarded in the AI industry due to its cost-effective positioning and the next iteration of Gaudi products will be available as early as next month with the Gaudi 3. Today, Intel is announcing the full product stack of Gaudi 3 products which includes the Accelerator cards (HL-325L OAM-Compliant), the Universal Baseboard (HLB-325), and the PCIe CEM (HL-388 Add-In-Card).

The Intel Gaudi 3 PCIe CEM is being detailed in today's announcement and will bring up to 1835 TFLOPS of FP8 (peak) compute capabilities along with 128 GB of HBM2e memory, a 600W TDP, 8 matrix multiplication engines, 64 TPCs 22 200 GbE RDMA NICs, all in a dual-slot full-height 10.5" solution. The OAM solution will be equipped with 96 MB of SRAM in two 48 MB SRAM stacks with a total HBM bandwidth of 3.67 TB/s and a total on-die SRAM bandwidth (L2) of 19.2 TB/s.
Each Matrix Multiplication engine is fully configurable (not programmable) and comes with a 256 x 256 MAC array structure with FP32 accumulators and 64K MACs/cycle for BF16 and FP8. The TPC or Tensor Processing Core features a 256B-wide SIMD vector processor which is programmable with C enhanced (TPC intrinsic), a VLIW with 4 separate pipeline slots, an integrated address generation unit & supports main 1/2/4-Byte datatypes (Floating Point and Integer).

The universal baseboard will be equipped with four Gaudi 3 AI accelerators which will feature 4 200 GbE interconnect links and 400 GbE through the QSFP-DD controller. Each OAM solution will have an x16 PCIe Gen5 link, offering up to 800 GB/s for scale-out and 1800 GB/s for scale-up bandwidth. The system itself will pack 512 GB/s of PCIe bandwidth. This solution is ideally designed for inferencing, fine-tuning, and small model training.
In terms of performance, the Intel Gaudi 3 AI accelerator will offer up to 9% better inference uplift in LLaMA 3 8B models while delivering 80% better performance per $ versus the H100. In LLaMA 70B, the Gaudi 3 AI accelerator will offer 19% better inference throughput and 2x performance per $ versus the H100.
The Intel Gaudi 3 reference server (HLS-3) node will come with 2 Intel Xeon Host CPUs such as the latest Xeon 6900P series and feature 8 OAM cards, offering a total bandwidth of 67.2 Tb/s (scale-up) and 9.6 Tb/s (scale-out). The AI solution will be backed by the Gaudi software suite which is the most commonly used Gen AI framework and supports FP16, BF16, and FP8 Quantization. Intel is working with various partners on the Gaudi ecosystem which include Dell Technologies, and Supermicro as the system providers, and IBM, LUMEN, Infosys, Naver, and many others as the SW enablers.
Follow Wccftech on Google to get more of our news coverage in your feeds.






































