⋮    ⋮  

AMD Vega and NVIDIA Pascal GP100 GPU Specifications Comparison – The Compute Powerhouses For Next Generation HPC Accelerators


Specifications of the AMD Vega GPU have just been teased, giving us an early glimpse of the next-gen compute powerhouse. Based on the FinFET architecture, the chip is expected to tackle its green competitor with brute force. Today, we will be comparing the specifications and performance figures of AMD's and NVIDIA's first gen flagships based on Vega and Pascal architecture.

The Vega and GP100 Specifications Comparison - The First Gen, FinFET Based Compute Titans From AMD and NVIDIA

NVIDIA and AMD have launched a range of new products based on the new FinFET process nodes. The GPUs that we will be comparing here are flagship products aimed at the HPC market. For example, the Pascal GP100 GPU has been in the market since June 2016 but has no consumer variant available. The upcoming AMD Vega GPU is also shown first in an HPC accelerator variant but would be available in some form to consumers during Q1 2017.

Let's take a quick look at the specifications of NVIDIA's Pascal GP100 GPU and AMD's Vega GPU.

NVIDIA Pascal GP100 - Powering The Tesla P100 Accelerator - Up To 12 TFLOPs

The Tesla P100 is based on the Pascal GP100 GPU. It is by far the largest FinFET GPU built to date, measuring over 600mm2 (610mm2 to be precise). The GP100 GPU was introduced at GTC 2016 and started shipping to customers in June 2016. Since then, NVIDIA has shipped several DGX-1 units that utilize Tesla P100 cards to HPC / Datacenter customers. The Tesla P100 accelerators were also used in NVIDIA's DGX SaturnV super-computer that is designed to build smarter cars and next-generation GPUs. It stands as the most efficient super computer on the Top500 list.

Like previous Tesla GPUs, GP100 is composed of an array of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. The chip achieves its colossal throughput by providing six GPCs, up to 60 SMs, and eight 512-bit memory controllers (4096 bits total).

The Pascal architecture’s computational prowess is more than just brute force: it increases performance not only by adding more SMs than previous GPUs, but by making each SM more efficient. Each SM has 64 CUDA cores and four texture units, for a total of 3840 CUDA cores and 240 texture units. These SM units have been arranged into 32 TPCs comprising of two SMs.

NVIDIA Pascal GP100 Inside Tesla P100 Is Just The Start

Because of the importance of high-precision computation for technical computing and HPC codes, a key design goal for Tesla P100 is high double-precision performance. Each GP100 SM has 32 FP64 units, providing a 2:1 ratio of single- to double-precision throughput. Compared to the 3:1 ratio in Kepler GK110 GPUs, this allows Tesla P100 to process FP64 workloads more efficiently. The Pascal architecture can also drive half precision compute at twice the speeds of single precision and INT8 operations at up to four times the speed.

The GPU also packs four stacks of HBM2 memory. The total VRAM featured on this chip is 16 GB which will be upgraded to 32 GB once HBM2 hits volume production in 2017. The chip features 720 GB/s bandwidth. Tesla P100 right now is based on a cut down variant of the GP100 chip with a full chip variant expected as AMD also enters the market with a competitor in the form of Vega.

P100 has a rated compute performance of 5.3 TF (FP64), 10.6 TF (FP32), 21.2 TF (FP16). A full GP100 variant can surpass 24 TF (HP), 12 TF (SP) and 6.0 TF (DP) compute performance (depending on clock speeds). The NVIDIA GP102 GPU which is a GP100 derivative that doesn't utilize NVLINK or Dual Precision hardware has a rated throughput of 12 TFLOPs on the Quadro P6000 solution.

The GP102 GPU is specifically designed with professional and enthusiast solutions in mind rather than AI / Deep learning / Data Center focused needs. The GP100 serves that market and hence we should only see a faster GP100 solution designed for such products instead of GP102 or GP104 entering the HPC market.

AMD Vega 10 Specifications - Powering The Instinct MI25 Accelerator - Up To 12.5 TFLOPs

The AMD Vega 10 GPU has a peak compute performance of 12.5 TFLOPs in single precision mode. It also has twice the packed math or mixed precision compute performance that is rated at 25 TFLOPs. We can assume that this chip has a dual precision performance of 6.25 TFLOPs if it features a 2:1 ratio of single- to double-precision throughput like its competitor.

The first product to use this chip would be the Instinct MI25 HPC accelerator. AMD packs a few additional technologies such as NCU which is being assumed as "Next Compute Unit". AMD also packs in a high band width cache and controller. The whole card package is rated at less than 300W which is lower than the P100.

The specific product has a memory bandwidth of 512 GB/s on an HBM2 interface. This would mean we are either looking at a 2048-bit memory bus (2 HBM2 stacks) at 1000 MHz or a 4096-bit memory bus (4 HBM2 stacks) at 500 MHz. The capacity of the card is 16 GB and would be utilizing the second generation high-bandwidth memory interface.

There are rumors around the number of cores and clock speed on the MI25 Vega GPU based card. Videocardz has a nice table showing off several possibilities. A 4096 SPs based Vega GPU would need to be clocked at 1.5 GHz to achieve the rated compute performance. Similarly, 5000 cores clocked at 1.25 GHz and 6250 cores at 1.00 GHz would yield the same performance. The card is passively cooled and comes with two power inputs (8+6 Pin?).

AMD Vega GPU and NVIDIA Pascal GP100 GPU Comparison

Now the main comparison between these chips will be an interesting one since it's not just about the hardware but also the software. Software matters just as much as hardware in the deep learning market. NVIDIA has their cuDNN (Libraries + Frameworks) designed for the deep neural learning market and AMD will have their own solution too, known as MIOpen by ROCmSoftware. AMD showed some performance numbers where the MI25 is twice as fast as the Titan X (Maxwell) graphics card and much faster than the new Titan X (Pascal).

It is understood that these libraries have been fine tuned to work best with Instinct cards just like cuDNN will have better performance on NVIDIA cards. The reason we are not talking about gaming performance of these cards is because these specific chips / models are designed for the HPC market. Also note that AMD isn't yet comparing performance against NVIDIA's GP100 GPU which is the direct competitor to Vega.

As stated, GP102 is GP100's derivative designed for consumers. AMD also showcased an 8 GB Vega powered GPU at the AMD tech Summit 2016 event which we covered in detail here. Those are very different products compared to the two mentioned here. Specifications comparison of the NVIDIA Pascal GP100 and Vega 10 GPU can be seen below:

AMD Vega 10 and NVIDIA Pascal GP100 GPU Specs Comparison:

GPU ArchitectureNVIDIA PascalAMD Vega
Product MarketTesla (P100.2)Instinct MI25
GPU Process16nm FinFET14nm FinFET
Flagship ChipGP100 GPUVega 10 GPU
GPU DesignSMP (Streaming Multiprocessor Pascal)NCU (Next Compute Unit)
Maximum Transistors15.3 BillionTBD
Maximum Die Size610mm2500-540mm2
Maximum Cores3840 CUDA Cores4096 Stream Processors
FP16 Compute~24.0 TFLOPs25.0 TFLOPs
FP32 Compute~12.0 TFLOPs12.5 TFLOPs
FP64 Compute~6.00 TFLOPs0.75 TFLOPs
Maximum VRAM16 GB HBM216 GB HBM2 (High Bandwidth Cache and Controller)
Maximum Bandwidth720 GB/s512 GB/s
Maximum TDP300W
Launch YearQ2 20161H 2017

AMD Vega Instinct Graphics Card Are The New FirePro S Series

The AMD Vega GPU showcased inside the Instinct MI25 is an incredibly fast graphics chip. Unlike the route that NVIDIA chose with GP100/GP102, AMD will stick with the same Vega chip on both HPC and consumer front. The reason behind it is that AMD doesn't has the resources to design a high-end chip exclusively for the HPC market while NVIDIA with their healthy business can design and produce such chips.

NVIDIA GP102 based P6000 shows what a fully unlocked GP100 can do. It can theoretically reach the same compute throughput as AMD's Vega GPU but we don't know if MI25 houses the full Vega 10 chip or if its also a disabled unit like Tesla P100. In that case, the full Vega 10 chip could be way faster. While Pascal has been available since Q2 2016, Vega's arrival to HPC in 1H 2017 will spark a battle between the two chip giants in this space after quite some time. AMD could price Vega 10 more competitively against GP100 in the HPC market as they did with Polaris in the mainstream consumer market.

AMD is also facing the same issues of HBM2 availability and yields as NVIDIA. NVIDIA cards can deliver 720 GB/s while AMD cards can deliver 512 GB/s with HBM2 interface which is under the promised 1 TB/s. NVIDIA is using Samsung based chips while AMD will be using chips produced by SK Hynix. One of the major reasons behind HBM2 delay in the consumer market was the low production of these chips but we know that overall production has ramped up in the current quarter. This will allow AMD to offer Vega with HBM2 memory during the first half of 2017 to consumers.

AMD Radeon Pro SSG With Vega GPU To Be Available Next Year

Other than the Instinct MI25 and the consumer based Vega graphics card, AMD also showcased a demo PC with Radeon Pro SSG. The last time AMD showed Radeon Pro SSG, it was based on a Fiji GPU. The latest build was based on the Vega GPU which confirms that AMD has a long roadmap for Pro SSG cards. More on these here.

“One of the most challenging constraints faced by GPU computing applications is the inability to access terabytes of data,” said Raja Koduri, senior vice president and chief architect, Radeon Technologies Group, AMD. “Radeon Pro SSG is poised to not only speed-up processing for many applications with very large datasets, but also to enable new application experiences by utilizing data persistence of non-volatile memory. This will be a disruptive advancement for many graphics and compute applications.”

“AMD has a long history of memory technology innovations, and Radeon Pro SSG is the latest example,” said Patrick Moorhead, Founder and Principal Analyst, Moor Insights & Strategy. “Larger local memory on the graphics card can speed-up processing for many applications with very large datasets, and should also allow for results with finer granularity and resolution. This will be a notable advancement for many graphics and compute applications.” via AMD

One of the surprises is that AMD is already showcasing server racks that utilize the Instinct MI25 GPU, designed by their partners. These racks range from 100 TFLOP (4 Instinct MI25) and up to 3 PetaFlops (120 Instinct MI25) server accelerator units.

AMD Vega 10 With 8 GB HBM2 - Device ID "687F:C1" Confirmed

Also, for those wondering what the secret 687F:C1 product that showed up in AOTS database a few weeks back was, it is a graphics card based on the Vega 10 GPU. Actually, this is the same card that was seen in AMD's demo PC at the Tech Summit conference. This variant packs 8 GB of HBM2 and is more focused at desktop PCs rather than HPC market. Performance of this card was on par with GeForce GTX 1080 but drivers have not been optimized for it yet.

We expect to learn a couple of more details on AMD's Zen and Vega GPUs, especially for desktop PCs which includes consumer oriented products tomorrow at the "New Horizon" event.

Which of the following HPC GPU is most interesting?