[Update]: A Good many people have been confused by the 128 GB/s speed of the stacked DRAM mentioned in our table. They have also confused it with the 1 Terabytes per second achievable speed mentioned by the CEO of Nvidia. The speed mentioned here is not the total memory bandwidth of a GPU once all Mem. Chips are in place and is the total memory bandwidth of one chip only. Notice that the table mentions GDDR 5 max as 28GB/s when we clearly don’t get just 28GB/s with GDDR5 GPUs.
[Editorial] Before I begin, my humble warning that this post might get a little technical. This generation of graphic cards is not about brute power, but efficiency and intelligent design. To achieve the maximum throughput while maintaining a very small foot print. Basically, true progress; and its not just about adding more transistors on a die. Nvidia demoed two critical technologies on GTC this year, namely NV Link and Stacked DRAM aka ’3D Memory’. However they understandably failed to give a lot of technical details since the demo was for the general audience, but I will try to take care of that today, albeit slightly late.
Nvidia Pascal: Using CoW (Chip-on-Wafer) based 3D Memory to Achieve the Next Gen GPU
Lets begin with 3D Memory. Now most of you know what SoC (System-on-Chip) means, but now we have a slightly less used term which I will take the opportunity to explain. Basically the CoW (cue mundane bovine jokes) or Chip on Wafer design is a technique used to plant a single logic circuit directly over or under a stack of wafers. Basically the chips are stacked and Silicon punched through in vertical pillars called TSV (Through Silicon Vias) till the Control Die. In this case, it means that the DRAMs that are stacked will be controlled by a single logic circuit and henceforth referred to as a ‘Chip-on-Wafer’ design. In all probability the Nvidia 3D RAM will be using the JEDEC HBM standard, which funnily enough was developed by JEDEC and AMD. However the actual production will most likely be carried out by SK Hynix. Pascal’s Stacked DRAM Design’s 2 modules of configuration:
Configuration 1: 2x Stack (512 Gb/s) + 1 (Control Die). This is called 2-Hi HBM.
Configuration 2: 4x Stack (1024 Gb/s) + 1 (Control Die) This is called 4-Hi HBM.
Nvidia might even bring a configuration standard in its Pascal Architecture between these 2 ‘traditional’ configs (3 Stacks) but that is unlikely. It could theoretically reach speeds of 2 - 4 Terabytes per second by ramping upto 16-Hi or 32-Hi HBM with multiple chips on the GPU. So here we have an interesting question. Green has promised us speeds up to 1 Terabytes per Second. So there is more or less no question that the high end GPUs will ship with very high layers of stack , however what about the middle and lower order? Will they also ship with the same layers of stack or a lesser configuration. If I were to make an educated speculation I would put my money on multiple configurations scaled across the spectrum of GPUs. As in the lower order to have 2 + 1 layers, while as the top order could have the 8 + 1 layers (or more) . Continuing the same speculation, HBM utilizes a low operating frequency and low power requirement. Therefore Nvidia’s Stacked DRAM will most probably operate at around 1.2V with frequency around 1Ghz. Here is a comparison chart between our traditional GDDR5 Ram ad x2 ad x4 stacks of DRAM with the control dies.