AMD just demoed the gaming capability of an upcoming enthusiast Radeon graphics card powered by its next gen Vega GPU and the results are in. In a head to head performance showdown with an overclocked GTX 1080, en equally specced Vega equipped system was able to outperform its counterpart running Doom at 4K in Vulkan.
AMD Showcases Vega In Action - Running Doom In Vulkan At 4K & DeepBench GEMM
At the event AMD showcased two systems featuring graphics cards powered by its next generation Vega GPU. The first had a Radeon instinct MI25 graphics accelerator equipped with 16GB of HBM2, second generation high bandwidth memory. This system was used to demonstrate Vega's capabilities in deep learning, which were quite impressive. The MI25 outperformed Nvidia's Tesla P100 accelerator based on the GP100 GPU at two key AI workloads.
The second system was configured with the consumer version of Vega, equipped with 8GB of HBM2. However, unlike the Mi25, AMD was very secretive about what this consumer card looked like. It was not shown to the press and to maintain its appearance a secret all ventilation and fan inlets were taped shut. Which obviously deprived the machine of any air flow. The folks over at PCGamesHardware.de made it a point to note that it's quite possible that the graphics card was throttling, as AMD's secretive measures made things quite toasty inside.
Additionally, the demo was conducted using an ordinary Fiji ( Radeon R9 Fury X, Fury and Nano ) driver with an additional debugging layer. No Vega optimized driver was used. Despite this the consumer Vega graphics card was able to outperform a GTX 1080 running at 1911Mhz by 10%. Although, Doom's Vulkan implementation has been shown to run faster on AMD GPUs.
With that being said, with optimized drivers and proper cooling it's likely that we'll see AMD squeeze out more performance out of Vega before launch. The folks over at pcgameshardware.de have also confirmed that this is the very same graphics card "687F:C1" that we spotted mingling with other GTX 1080s on AOTS's benchmark leaderboard a couple of weeks back.
Vega's Confirmed Specs
Members of the press inside the demo room were able to spot some key specifications pertaining to Vega by taking a look at the expanded statistics in Doom. 8GB of HBM2 memory for the consumer version of Vega was confirmed.
Additionally, an employee slipped a key specification that wasn't supposed to be made public yet and it's that Vega 10 features 512GB/s of memory bandwidth. The memory capacity and bandwidth are clear indications that Vega 10 has a 2048bit wide memory interface. Half that of its older sibling, Fiji. However, because HBM2 is rated at twice the speed of HBM1, Vega 10 is able to achieve the same 512GB/s of memory bandwidth.
In terms of graphics horsepower, the Vega 10 powered MI25 accelerator is rated at a staggering 12.5 TERAFLOPS of single precision floating point compute and double that in half precision FP16 compute. That's 1.5 TERAFLOPS more than Nvidia's Tesla P100 accelerator, powered by the monstrous 610mm² GP100 GPU and 2.5 TERAFLOPS more than the GTX 1080.
The MI25 is a professional, passively cooled product. The gaming oriented variant of Vega, equipped with more aggressive cooling solutions and running at higher clock speeds, would naturally be expected to achieve an even higher figure.
Vega's Next Generation Compute Unit Architecture
Vega is based on a brand new graphics architecture, the particulars of which we had already detailed briefly in our exclusive piece about Vega 10 and Vega 11. AMD confirmed today in its announcement what we had brought you back in October, which is that Vega makes use of a brand new compute unit design called NCU. Short for Next Compute Unit.
AMD hasn't discussed any details pertaining to the new design. However, we're going to give you an exclusive high-level look at NCU. This new architecture holds several key advantages over its predecessor. Chief among which is that each Vega NCU is now capable of simultaneously processing variable length wavefronts. To understand why this is such a big deal we have to look at AMD's current GCN implementation.
In AMD's current GCN implementation, each compute unit has four 16-wide vector SIMD units, capable of executing four 16-wide wavefronts ( a group of threads ) over four cycles. In addition to one scalar unit, capable of executing one instruction per cycle. This unit is delegated time-critical tasks, where the four-cycle turnaround of the SIMD units isn't sufficient.
Unfortunately, these 16-wide SIMD units work exactly the same no matter how small of a wavefront they're fed. Executing a 16-wide wavefront would take just as long as executing a 4-wide wavefront, rendering the other 12 ALUs inside the SIMD completely useless. And as graphics workloads are inherently non uniform it's effectively impossible to find any scenario where all 16-wide SIMD units are fully occupied at any given time.
Variable Width SIMDs, Getting More Performance Out Of Fewer Cycles
This is no longer the case in AMD's new GCN implementation inside Vega. The V9 architecture includes new incredibly clever schedulers and coherency subsystems that allow several smaller wavefronts to be executed simultaneously inside any SIMD that's able to accommodate the workload. This in effect allows each NCU to finish considerably more work in the same amount of time compared to its predecessor. In addition to freeing up valuable cache and memory resources for other compute units.
It's very hard to predict how much of a difference this big of an improvement in resource utilization and CU occupancy will yield given how unpredictable and inherently fluctuant graphics workloads are. Which brings us neatly to Vega's rumored specs.
Vega, The Rumored Specs
One of the few things that AMD has not talked about regarding Vega's specifications to date are the number of GCN stream processors it actually has. Vega 10 is believed to have 4096 GCN stream processors, according to the LinkedIn page of a leading engineer which leaked earlier this year.
Assuming that this figure is accurate, Vega 10 would have to operate at a frequency 20% higher than Polaris 10 to achieve the 12.5 TFLOPS of the Radeon Instinct MI25. We're talking 1520Mhz+, on a passively cooled enterprise GPU. A clock speed that few, mostly liquid cooled, overclocked RX 480 cards can achieve. None of AMD's current or past professional grade graphics cards and/or accelerators come close to that. We've also never seen such a large hike in clock speeds from one graphics generation to another in the same process node generation.
AMD Vega Lineup
|Graphics Card||Radeon R9 Fury X||Radeon RX 480||Radeon RX Vega Frontier Edition||Radeon Vega Pro||Radeon RX Vega (Gaming)||Radeon RX Vega Pro Duo|
|GPU||Fiji XT||Polaris 10||Vega 10||Vega 10||Vega 10||2x Vega 10|
|Process Node||28nm||14nm FinFET||FinFET||FinFET||FinFET||FinFET|
|Stream Processors||4096||2304||4096||3584||4096 (?)||Up to 8192|
8.6 (FP16) TFLOPS
5.8 (FP16) TFLOPS
~25 (FP16) TFLOPS
22 (FP16) TFLOPS
>25 (FP16) TFLOPS
|Memory||4GB HBM||8GB GDDR5||16GB HBM2||TBA||TBA||TBA|
|Launch||2015||2016||June 2017||June 2017||July 2017||TBA|
It's more plausible that this 20% improvement actually comes from the IPC ( instruction per clock ) improvement of the new architecture. In fact, it's not unlikely that the MI25 runs at an even lower frequency than that of the RX 480. Especially considering it's a 300W, passively cooled enterprise part. Which would indicate that 20%+ of the chip's performance stems directly from architecture-based enhancements.
Whether that's actually the case or not remains to be seen. A combination of IPC uplift and higher clock speeds is probably the most plausible scenario.