Intel Benchmarks Monster 42 TFLOPs 4 Tile ‘Arctic Sound’ Xe GPU With 16,384 Cores

Aug 13, 2020

Almost 6 months after I first published my exclusive, Intel has confirmed the existence of an absolute monstrous MCM-based Intel Arctic Sound GPU with 4-tiles and up to 16,384 cores. The chip is alive and well deep in Intel's labs and the company showed off the raw power it can wield in the recently held architecture day.

Intel confirms monstrous 42 TFLOP 4-tile Xe GPU lurking in their labs, shows off transcoding and fp32 compute scaling benchmarks

Almost six months ago to date, I was the first person to tell you Intel's 4-tile Arctic Sound is going to have 2048 EUs and roughly 36 TFLOPs of compute (based on a clock speed assumption of 1.1 GHz). Since each EU equates roughly to 8 cores, you are essentially looking at an MCM GPU that has 16,384 cores in total! It also looks like my clock speed assumptions were too conservative because Intel benchmarked the exact same card running at 1.3 GHz with a resounding 42 TFLOPs of FP32 compute and near-perfect (x3.993) scaling on a 4-tile MCM GPU.

Raja Koduri confirmed that they are working on 1 Tile (512 EU, 4096 cores), 2 Tile (8192 cores) and 4 Tile (16384 cores) versions of the Arctic Sound GPU based on HP architecture. While drivers and revisions of the chip are still in their infancy, they did have the chip capable of running some benchmarks for the audience. A single tile of Xe HP can transcode 10 seperate streams of 4K 60 HEVC content - which is an absolutely mean feat. Since scaling is almost perfectly linear, the 4-Tile version should be able to transcode 40 different streams!

Unfortunately for gamers, this particular GPU is designed for data centers and likely won't be coming to the consumer market anytime soon. That said, the company did announce a brand new gaming-oriented brand of GPUs called the Intel HPG (Intel High-Performance Xe architecture for Gaming).

Intel 4-tile Xe GPU benchmarks

Here is a transcription of the benchmarking demo shown by Intel on Architecture Day 2020 straight from the horse's mouth:

We've leveraged Intel's unique packaging innovations for an industry-first multi tiled highly scalable and high-performance architecture. This is XE HP. Let's take a look at what it can do. XE HP was created to be a media supercomputer on a PCIe card. Here you'll see us transcoding a 4K video real-time, up to 60 frames per second, on a single stream, but we didn't stop there.

By utilizing our industry leading media IP and creating the most dense media architecture on a GPU with ffmpeg, we can transcode up to 10 full streams of high-quality HEVC 4K video at 60 frames per second on a single tile and you can see the ffmpeg output on screen displaying the progression of real-time transcoder of each frame.

By optimizing for bitrate efficiency and stream density customers are able to realize real world TCO improvements for delivery of video content that scale along with media. We place compute throughput in the forefront of Xe architecture, increasing the total number of execution units by over 100x when compared to XE LP. Viewing this through the lens of fp32 performance XE HP covers a dynamic range of compute throughput with near linear scalability from one tile to four tile and tracking the most FP 32 Peak performance placed onto a single GPU package when measured by the CLP benchmark.

This unique combination of compute and media performance provides customers the flexibility to design for their most demanding applications, and we've only just begun.

It is clear that Intel has a true mammoth of a chip in its labs and while I am sure they will be able to squeeze more performance out of it yet, 42 TLFOPs is already the world's fastest single GPU. AMD with its Zen CPUs showed the world how the MCM approach is the way forward for Moore's Law and while NVIDIA and AMD have yet to introduce this approach for GPUs, looks like Intel is already quietly working on doing just that. Intel has said that the Xe HP GPUs will be enabled in Intel's Dev Cloud "soon" so software developers can start working on its ecosystem.