Intel Unveils Habana Gaudi2 & Greco 7nm Deep Learning Accelerators: Gaudi2 With 24 TPCs, 96 HBM2e, 600W TDP Offering Faster Training Performance Than NVIDIA Ampere A100

Hassan Mujtaba • May 10, 2022 at 10:54am EDT

Intel has today officially unveiled its 7nm Habana Gaudi2 and Greco Deep Learning accelerators, offering up to 2x the throughput performance versus NVIDIA's Ampere A100 GPU.

Intel Unveils 7nm Habana Gaudi2 & Greco Deep Learning Accelerators, Up To 2x The Throughput Performance Versus NVIDIA's Ampere A100

The latest Deep Learning accelerators for data centers were designed at Intel Habana Labs. These are the latest dedicated Deep Learning platforms, offering a high percentage of DL training and/or inference. So starting with the details, we should first point out that both the Habana Gaudi2 and the Greco are based on a 7nm process node. Unfortunately, this detail doesn't really help us much because 7nm could be referring to the N7 process on TSMC, Intel 7 (formerly Intel 10nm), or Intel 4 (formerly Intel 7nm and the least likely).

The original Habana Gaudi processors were built on the 16nm TSMC process which makes it more likely for this chip to be on N7 or Intel 7. Whatever the case is, considering the Gaudi 2 platform is clearly on a far smaller node than 16nm (which in itself gives a density increase of roughly 50%), As for the specifications, the Gaudi2 features 24 TPCs for media decode and processing running on a FP8 format (versus 8 TPCs). The memory configuration includes 96 GB of HBM2e memory, offering 2.45 TB/s bandwidth and an additional 48 MB of SRAM. Networking is provided through 24 100GbE switches. Such a big jump in performance also means that the TDP has to be upped dramatically & the Gaudi2 operates at a 600W TDP (versus 350W).

In terms of performance, ResNet-50 training throughput shows a 1.9x gain for the Intel Habana Gaudi2 accelerator versus a single A100 80 GB GPU. In NLP BERT Phase-1 Training, the chip has a 1.7x throughput and a 2.8x throughput in Phase-2 training. Lastly, Intel also put together a BERT training throughput comparison which shows a 2.0x gain for the Gaudi2 over its competitor, the NVIDIA A100. Overall, the new accelerator offers training cost savings of up to 75% versus NVIDIA solutions.

There's also the Intel Habana Greco which is a deep learning inference designed for peak efficiency and is also based on the same 7nm process node. The accelerator offers 16 GB of memory with 240 GB/s LPDDR5 memory and an additional 128 MB of on-chip SRAM. The compute capabilities include BF16, FP16, and INT4 formats for media decode and processing.

The TDP is rated at just 75W. Compared to the OAM module that the Gaudi2 comes in, the Greco comes in a single-slot HHHL form factor. Since its TDP is rated at 75W, there's no need for external power connectors on the card.

Intel has also announced that the 7nm Gaudi2 processor is available to customers starting now while the Greco will be sampling to select customers in the second half of 2022.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on Intel Unveils Habana Gaudi2 & Greco 7nm Deep Learning Accelerators: Gaudi2 With 24 TPCs, 96 HBM2e, 600W TDP Offering Faster Training Performance Than NVIDIA Ampere A100

Intel Unveils Habana Gaudi2 & Greco 7nm Deep Learning Accelerators: Gaudi2 With 24 TPCs, 96 HBM2e, 600W TDP Offering Faster Training Performance Than NVIDIA Ampere A100

Intel Unveils 7nm Habana Gaudi2 & Greco Deep Learning Accelerators, Up To 2x The Throughput Performance Versus NVIDIA's Ampere A100

Trending Stories

Xbox Studio Leaders Reportedly Detest Game Pass, Arguing it Destroyed the Value of Their $40+ Games Now Available for Pennies

Over 80% Of Samsung Foundry Workers Are Planning To Leave Amid A Yawning Pay Gap With The Memory Division

CXMT Supply Chain To Witness Major Process Transition To Seize DDR6 Opportunity Before Commercialization, Threatening Samsung’s And SK hynix’s Global Hold

SpaceX Awards Foxconn A Part In A Huge $52 Billion Order For 13,000 Racks Of NVIDIA GB300 AI Servers, Where Each Rack Costs $4 Million And The Total Order Spans Nearly 1 Million GPUs

Matt Booty Defends Xbox’s Murky Exclusive Framework Across 20 Franchises, Yet Even Halo’s Own Roadmap Has Reportedly Collapsed

Popular Discussions

AMD Medusa Point 10-Core “Zen 6” CPU Beats Strix Point 10-Core “Zen 5” By Nearly 35% While Operating at 5.4 GHz

AMD Ryzen 7 7700X3D 4.5 GHz “3D V-Cache” CPU Review: The Budget X3D Champ For AM5

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

AMD Ryzen 7 7800X3D CPU Drops To $299 A Day Ahead of 7700X3D’s Launch, Bringing 3D V-Cache Goodness To Mainstream Gamers

Intel Unveils Habana Gaudi2 & Greco 7nm Deep Learning Accelerators: Gaudi2 With 24 TPCs, 96 HBM2e, 600W TDP Offering Faster Training Performance Than NVIDIA Ampere A100

Intel Unveils 7nm Habana Gaudi2 & Greco Deep Learning Accelerators, Up To 2x The Throughput Performance Versus NVIDIA's Ampere A100

Related Story Intel’s Former CEO Gelsinger Admits Firm ‘Scoffed’ at NVIDIA’s GPUs While Riding High on CPU Dominance & Makes Big Quantum Computing Claims

Further Reading

Trending Stories

Popular Discussions