⋮  

Nvidia Pascal Architecture Detailed Technical Analysis – Stacked DRAM and NV Link

Usman Pirzada
Posted Apr 12, 2014
48Shares
Share Tweet Submit

[Update]: A Good many people have been confused by the 128 GB/s speed of the stacked DRAM mentioned in our table. They have also confused it with the 1 Terabytes per second achievable speed mentioned by the CEO of Nvidia. The speed mentioned here is not the total memory bandwidth of a GPU once all Mem. Chips are in place and is the total memory bandwidth of one chip only. Notice that the table mentions GDDR 5 max as 28GB/s when we clearly don’t get just 28GB/s with GDDR5 GPUs.

[Editorial] Before I begin, my humble warning that this post might get a little technical. This generation of graphic cards is not about brute power, but efficiency and intelligent design. To achieve the maximum throughput while maintaining a very small foot print. Basically, true progress; and its not just about adding more transistors on a die. Nvidia demoed two critical technologies on GTC this year, namely NV Link and Stacked DRAM aka ‘3D Memory’. However they understandably failed to give a lot of technical details since the demo was for the general audience, but I will try to take care of that today, albeit slightly late.

Nvidia Pascal: Using CoW (Chip-on-Wafer) based 3D Memory to Achieve the Next Gen GPU

Lets begin with 3D Memory. Now most of you know what SoC (System-on-Chip) means, but now we have a slightly less used term which I will take the opportunity to explain. Basically the CoW (cue mundane bovine jokes) or Chip on Wafer design is a technique used to plant a single logic circuit directly over or under a stack of wafers. Basically the chips are stacked and Silicon punched through in vertical pillars called TSV (Through Silicon Vias) till the Control Die. In this case, it means that the DRAMs that are stacked will be controlled by a single logic circuit and henceforth referred to as a ‘Chip-on-Wafer’ design. In all probability the Nvidia 3D RAM will be using the JEDEC HBM standard, which funnily enough was developed by JEDEC and AMD.  However the actual production will most likely be carried out by SK Hynix. Pascal’s Stacked DRAM Design will probably come in 2 modules of configuration since they mentioned the 1Tb/s mark:

Configuration 1: 2x Stack (512 Gb/s*) + 1 (Control Die). This is called 2-Hi HBM.
Configuration 2: 4x Stack (1024 Gb/s*) + 1 (Control Die) This is called 4-Hi HBM.

Nvidia might even bring a configuration standard in its Pascal Architecture between these 2 ‘traditional’ configs (3 Stacks) but that is unlikely. It could theoretically reach speeds of 2 –  4 Terabytes per second by ramping upto 16-Hi or 32-Hi HBM with multiple chips on the GPU. So here we have an interesting question. Green has promised us speeds up to 1 Terabytes per Second. So there is more or less no question that the high end GPUs will ship with high layers of stack , however what about the middle and lower order? Will they also ship with the same layers of stack or a lesser configuration. If I were to make an educated speculation I would put my money on multiple configurations scaled across the spectrum of GPUs. As in the lower order to have 2 + 1 layers, while as the top order could have the 8 + 1 layers (or more) . Continuing the same speculation, HBM utilizes a low operating frequency and low power requirement. Therefore Nvidia’s Stacked DRAM will most probably operate at around 1.2V with frequency around 1Ghz. Here is a comparison chart between our traditional GDDR5 Ram and x2 and x4 stacks of DRAM with the control dies.

 WCCFTech  GDDR5  2-Hi HBM ‘Stacked DRAM’*  4-Hi HBM ‘Stacked DRAM’*
 I/O  32  512  1024
 Max Bandwidth Per Pin  7 Gbps  1 Gbps  1 Gbps
 Max Bandwidth Per Stack  28 GBps  64 GBps  128 GBps
 Voltage  1.35 – 1.65  ~1.2  ~1.2
 Command Input  Single  Dual Dual
 Layers   1  2 + 1  4 + 1
Watch Dogs 2 PC Performance Benchmarks - Results Show NVIDIA GPUs in Lead Over AMD, GeForce Performance Guide For PC Users Published

[*Calculation for the table is given in the comments] You might have noticed that the 3D Memory to have a Dual Command input feature. The reason for this is that a single layer of 3D Memory has two RAM modules. I.e. an 8GB 4-Hi HBM RAM would be divided into 4 layers with each layer having, 1 + 1 GB configuration. This is what enables the Dual Command feature. Of course if we say increase every module to 4 GB, then on a 4-Hi HBM RAM we ca achieve a 16GB configuration and vice versa if we want to make it smaller. However in this generation, don’t expect anything above the 8GB 4-Hi HBM Ram configuration. Of course we can scale it to 8-Hi HBM Ram with a maximum capacity of 32 GB but that kind of memory in desktop GPUs is unlikely.

Nvidia Pascal: NV Link – A Very High Speed Interconnect

I would be very surprised if Nvidia’s ‘3d memory’ is not utilized in AMD Next Gen too, considering that they are the ones who actually came up with HBM Ram not to mention the standard is open. However it is a slightly different story with the NV Link, which appears to be more or less proprietary. NV Link is going to come in 3 different layouts in the upcoming Pascal Architecture. The first one is irrelevant to us but I am going to touch upon it slightly anyways:

1. NV Link designed for the IBM Power CPUs
2. NV Link designed for the GPU – CPU connection via your normal PCI Express slot
3. NV Link designed for the Onboard ARM – GPU Connection of future Nvidia GPUs.

NVIDIA GeForce GTX 1060 Reference Model Pictured In All Its Glory - Single 6-Pin Power Connector, Compact PCB and No SLI Bridge

Since IBM does not have a HPC/Server GPU solution, it has decided to pursue a very promising partnership with Nvidia. Pascal Architecture’s NV Link would see Nvidia, getting out of its comfort zone of x86 and making its GPUs interface with IBM’s Power CPUs.  On the PCI Express mode the NV Link, which is basically a super high speed serial interconnect, uses an embedded clock differential signaling technique aka differential signal. This allows it to achieve nearly 5 – 10 times the speed of a PCI Express 3.0 running in x16 Mode. The actual speeds is though to be along the lines of 80GB/s to 230GB/s.

What Nvidia is going for is basically a complete point-to-point design where the processors are connected directly to each other without going through a third party channel. However this means that the current PCI-E slot is no good. So NV Link will have to be physically included in Motherboards of the future. Rumors put NV Link to be a glorified Mezzanine connector, which will allow, bluntly put, a socketed GPU. Since Pascal already has on package memory, Nvidia’s custom bus ‘NV Link’ with the help of a Mezzanine interface will allow never seen before speeds in GPU data transfer. Not only that but a custom Mezzanine connector will be able  to supply far more than the 75W present in our PCI-E slots today, allowing GPUs to be completely powered by the NV Link. However Anandtech has raised some very valid criticism. The criticism being that NVLink is in no position to replace PCI-E anytime soon. Best case scenario being GPUs with dual NV Link – PCI-E connectivity and the server market (IBM) taking hold of NV Link completely. I won’t go into much more detail on how the NV Link functions via blocks, since that has already been covered multiple times. So umm, yeah, thats all folks.

 

Share Tweet Submit