AMD Reveals the Monsterous ‘Exascale Heterogeneous Processor’ (EHP) with 32 x86 Zen Cores and Greenland HBM2 Graphics on a 2.5D Interposer


There had been rumors about AMD working on a huge APU with Zen cores and Greenland HBM graphics, something that AMD had hinted upon in its official roadmap. However, it has (finally) officially revealed details about the upcoming APU in a paper submitted to IEEE (Institute of Electrical and Electronics Engineer). The APU, dubbed an "Exascale Heterogeneous Processor" or EHP for short is the mother of all APUs with 32 Zen Cores, an absolutely huge Greenland graphics die and upto 32 GB of HBM2 memory - all on a 2.5D interposer.

Exascale Heterogeneous Processor (EHP) is AMD's promised monster APU for the HPC segment

The research in question can be found over here and requires paid access, however we were able to get the relevant piece courtesy of As you may notice, the diagram is a very simple block diagram that doesn't really reveal much, except the number of CPU cores. Fortunately for us, AMD's roadmap and the relevant knowledge of HBM technology makes it almost child's play to identify the exact parts.  

AMD EHP APU 32 Zen Cores Greenland HPC

Provided the diagram is drawn accurately, the first thing you will note is that there are exactly 32 "CPU Cores". Since the EHP (APU) is scheduled for 2016-2017, we are most definitely looking at Zen cores (not to mention Excavator cores wouldn't fit). I can count 4 dies per stack, and since we are dealing with HBM2 at the very least (given the timeframe), these constitute 4-Hi stacks. HBM2 is 8Gb per die, which equates to 4GB per stack (for a 4-Hi stack) in this diagram or a total of 32 GB HBM2 memory onboard the interposer. That's not it either, memory can be expanded further via the DDR4 channel present on package.

As far as the graphics portion of the Exascale Heterogeneous Processor is concerned, what we know for sure is that this will be the next generation Greenland graphics, what we don't know is how much the exact core count will will be. Since we have no idea how big Zen Cores actually are (or if the diagram is even drawn to scale) it would be unwise to try to reverse engineer the die size of the GPU from the picture. We can safely say however, that this is one of the hugest GPU dies we have encountered so far. If I were to make a wild guess (caution: speculation) for the sake of giving a number I would say the number of stream processors could easily be above 3072 considering we are talking about a lower process and a huge die.

This brings us to our third deduction. A 2.5D interposer has been used in the EHP (APU) and the CPU and GPU cores all togethery are too numerous (and huge) to have been manufactured as a single die. Not only would the yields on such a monstrosity be beyond imagining, it would be pretty impossible to manufacture such a thing in the first place. The likely conclusion is therefore, that the two compute and graphic portions of the APU are manufactured separately and put together on the interposer later on in assembly (possibly at UMC's Fab 12 foundry in Singapore, which is already used to assemble Fiji dies). So basically, AMD is fabricating the compute side of the Scale Heterogeneous Processor (EHP) in dies with 16 Zen cores each, for a total of 2 computing and 1 graphics die assembled on the interposer (ignoring the HBM).

Now there has been word on the rumor mill about AMD's HPC APU for quite a long time and we are fairly certain there will be a 16 core variant as well. Previous leaks have indicated that the processor will be constructed using AMD's Coherent Fabric - which so happens to be a custom interconnect for the purpose of the cores communicating with the Greenland graphics. Each Zen core will have access to 512KB of L2 cache and 4 Zen cores will share 8MB of L3 cache in the 'Exascale Heterogeneous Processor'. That equates to a grand total of 16MB L2 Cache and 64 MB L3 cache. Each Zen core will be capable of running two threads (thanks to the company's shift to Simultaneous Multi-Threading) for a total of 64 threads in this huge APU. The processor is thought to have 8 DDR4 channels with a capacity of 256GB per channel.

Unfortunately for the enthusiasts, there is no guarantee that the EHP will trickle down into consumer variants. Infact, I will be genuinely shocked if it does. Even the 16 core variant that was spotted quite a bit earlier would be hard pressed to enter the mainstream segment. In any case, Heterogeneous Processing is an applause worthy approach to handle the HPC problem. Equipped with Greenland class stream processors for parallel workloads and a small army of Zen cores for the rest, this not-so-tiny APU would be handle just about anything. Not to mention, as the name suggests, the Exascale Heterogeneous Processor is built to be scaled to kingdom come, allowing for a truly powerful rival to Intel's Xeon Phi coprocessors and even the general GPGPU market.