AMD’s Vega 10 Based Dual GPU Graphics Card Inbound – Expected Single Precision Compute Performance To Be Greater Than 18 TFLOPs


It looks like the hardware leak season is in full swing because a report by Fudzilla reveals that AMD is working on a dual chip Vega 10 graphics card for the professional market. This comes a few weeks after Videocardz confirmed the existence of the same – so you know have two independent confirmations for dual GPU Vega 10. This is something that isn’t really all that surprising and probably fairly expected in terms of lineup progression but still goes on to show that AMD will be leveraging its arsenal of GPUs to their full potential to maximize its return on investment.

Vega 10 Dual GPU expected to land in Q2 2017 - targeted at the professional market

Not much details are known about the Vega 10 based dual GPU at this point but since AMD has always included full dies in any dual GPU configuration we can simply multiply the known configuration of Vega10 by 2. It’s really as simple as that.
Since Vega 10 has 64 compute units, the dual GPU will have 128 CUs for a total of 8192 stream processors (assuming the SP to CU ratio remains the same in Vega architecture as Polaris). You are also looking at a total of 32 GB of HBM2 on the board with an aggregate bandwidth of 1 TB/s (512 GB/s x2). The TBP (Total Board Power) is expected to be around 300W so we can expect it to be clocked lower than its single GPU siblings.

The Vega 10 GPU has roughly 24 TeraFLOPs of 16-bit compute. 16 bit compute is, of course, half-precision work and since Vega has native 16-bit compute support, we can find out the single precision performance by simply cutting the number in half: which gives us exactly12 TeraFLOPs of single precision compute. The required clock rates for this amount of performance can also be easily derived using the function [Stream Processors * Clock Rate * 2 Instructions per Clock] which are somewhere in the vicinity of 1465 Mhz.

Naturally, a dual chip solution is never clocked as high as the single chip solution so assuming clock rates of 1100 Mhz to 1200 Mhz you are looking at en effective computational power of 18 TeraFLOPs to 19.6 TeraFLOPs easily – and that’s single precision figures. If Nvidia does not respond with a dual GPU solution as well (and dual GPUs aren’t usually their style in the professional market) then this graphics card will easily be able to give competition to Nvidia’s P100 and GP102 based offerings.

Both sources have indicated a time span of Q2 2017 for the arrival of this monster but it is very much possible that we are going to be seeing a demo of the same by the end of the year; around the same time the single chip based version would start shipping. We have also previously posted about all of AMD's upcoming offerings which include the 7nm based Vega 20 GPU as well as the Navi series. The process node for this particular graphics card will be the 14nm FinFET process from GlobalFoundries/Samsung.

A Vega 10 dual GPU, at both clock rates (1200 Mhz or 1100 Mhz) easily destroys the P100 on paper – which has a single precision performance of 9.3 TeraFLOPs for PCI-e based cards. It goes without saying that since we are comparing across two completely different architectures here, what is on paper can be different in real life. This also means that even if Nvidia responds with a dual GPU solution of its own, the Vega10 based offering would remain very competitive. The ball is now undoubtedly in Nvidia’s court and we shall see what its high-end looks like in the coming months.

AMD Next Generation Vega 10, 11, 20 and Dual GPU Graphics Card Rumored Lineup:

WCCFTechPolaris 10Vega 11Vega 10Vega 10 Dual GPUVega 20
Process14nm FinFET14nm FinFET14nm FinFET14nm FinFET7nm FinFET
Transistors In Billions5.7TBATBATBATBA
Stream Processors23042304+ (est.)409681924096
Clock Speed1266 MhzTBA1526 Mhz 1100 Mhz+ (est.)1800 Mhz+ (est.)
Performance5.8 TFLOPSTBA12.5 TFLOPS19 TFLOPS - 24 TFLOPs (est.)15 TFLOPS+
Memory8GB GDDR5TBA8GB/16GB HBM216-32GB HBM216-32GB HBM2
Memory Bus256bitTBA2048-bit (2 Stacks)4096-bit (2048-bit x2)4096-bit (4 Stacks)
PCI Express3.0TBA3.03.0 4.0
Bandwidth256 GB/sTBA512 GB/s1 TB/s1 TB/s