AMD’s RX Vega GPU Spotted In 3rd Party Benchmarks – Specifications Confirmed at 4096 SP, 8GB of HBM Memory with 2048-bit Bus Width
It looks like the engineers over at AMD's RTG division are putting in a lot of work on the upcoming RX Vega graphics card because the GPU has been spotted (initially by Videocardz.com) in a multiple benchmarks including Compubench and SiSoft Sandra. This leak also completely confirms the specifications of the core at 64 CUs with 64 stream processors each for a grand total of 4096 SPs. We also have final information about the memory type as well as the bus width.
AMD working on improving RX Vega clocks - currently a 9.8 TFLOPs GPU
AMD's high end graphics card will contain 64 Compute Units, which (assuming the same ratio of CU to SPs as the current iteration of GCN) translates to exactly 4096 Stream Processors. The internal codename of this GPU is GFX9 (in this case the device ID is 687F:C3). Remember all our internal nomenclature analysis? Well, it’s the same thing, only in a more appealing format. Hawaii was GFX7, Polaris is GFX8 and RX Vega is GFX9.
The GPU is stated as being manufactured on the 14nm FinFET node which means you are looking at primarily GlobalFoundries based chips here (with Samsung based chips as required under the amended WSA). It will be shipping with 8GB of HBM memory (type unknown at this point, but it should be HBM2), with 2048-bit bus width. The TDP is slated to be around 225W.
Here comes the interesting part however, as the GPU will have roughly 25 TeraFLOPs of 16-bit compute. 16 bit compute is, of course, half-precision work and if RX Vega has native 16-bit compute support then we can find out the single precision performance by simply cutting the number in half: 12.5 TeraFLOPs of compute. A solid, concrete number is more than any tech journo can ask for, but it allows us to easily reverse engineer the clock speed the GPU will be clocked at.
Double precision performance of the RX Vega 10 GPU is listed as 1/16th rate and is 650 GFlops. Scaling proportionally to the other specifications, we should expect this to be around ~675 GFlops now.
This particular variant is clocked at 1200 MHz which translates into 9.8 TFLOPs on the dot. Quite some distance away from AMD's promised performance number of 25 TFLOPs so its clear that the engineers still have some work to do refining the clocks. However, we are talking about a completely different architectural revision here and since according to AMD this will have a higher throughput per clock, comparisons on this basis alone will have to wait until more details arrive.
With a single precision compute of 12.5 TeraFLOPs per second on a GPU with 4096 cores, and considering TeraFLOPs is a function of Clock Speed * 2 Instructions Per Clock * Cores, you are looking at a RX Vega 10 graphics card that needs to be clocked at roughly 1526 MHz. Considering the fact that the already revealed MI25 is passively cooled it should be able to achieve this mark in due time and even exceed it. The specifications mention the HBM2 stack with 512 GB/s of bandwidth but this is listed as 16 GB. We already know from the RX Vega Doom demos that the product will have total vRAM of 8 GB so this is probably only relevant for the server market. The card will consume 225 Watts of power.
Considering the Polaris 10 GPU is clocked at 1266 MHz however, this is a fairly significant step up from the last iteration of the node and will probably leverage the increasing maturity of the 14nm process over at GlobalFoundries. On the other hand, I can tell you that even if we were assuming a final clock rate similar to Polaris 10 (1266 MHz), you are still looking at a single precision compute of 10.3 TeraFLOPs, which is still a huge performance leap over the mainstream-oriented Polaris 10.