Nvidia Pascal Architecture Detailed Technical Analysis – Stacked DRAM and NV Link

Usman Pirzada • Apr 12, 2014 at 12:09am EDT

[Update]: A Good many people have been confused by the 128 GB/s speed of the stacked DRAM mentioned in our table. They have also confused it with the 1 Terabytes per second achievable speed mentioned by the CEO of Nvidia. The speed mentioned here is not the total memory bandwidth of a GPU once all Mem. Chips are in place and is the total memory bandwidth of one chip only. Notice that the table mentions GDDR 5 max as 28GB/s when we clearly don't get just 28GB/s with GDDR5 GPUs.

[Editorial] Before I begin, my humble warning that this post might get a little technical. This generation of graphic cards is not about brute power, but efficiency and intelligent design. To achieve the maximum throughput while maintaining a very small foot print. Basically, true progress; and its not just about adding more transistors on a die. Nvidia demoed two critical technologies on GTC this year, namely NV Link and Stacked DRAM aka '3D Memory'. However they understandably failed to give a lot of technical details since the demo was for the general audience, but I will try to take care of that today, albeit slightly late.

Nvidia Pascal: Using CoW (Chip-on-Wafer) based 3D Memory to Achieve the Next Gen GPU

Lets begin with 3D Memory. Now most of you know what SoC (System-on-Chip) means, but now we have a slightly less used term which I will take the opportunity to explain. Basically the CoW (cue mundane bovine jokes) or Chip on Wafer design is a technique used to plant a single logic circuit directly over or under a stack of wafers. Basically the chips are stacked and Silicon punched through in vertical pillars called TSV (Through Silicon Vias) till the Control Die. In this case, it means that the DRAMs that are stacked will be controlled by a single logic circuit and henceforth referred to as a 'Chip-on-Wafer' design. In all probability the Nvidia 3D RAM will be using the JEDEC HBM standard, which funnily enough was developed by JEDEC and AMD. However the actual production will most likely be carried out by SK Hynix. Pascal's Stacked DRAM Design will probably come in 2 modules of configuration since they mentioned the 1Tb/s mark:

Configuration 1: 2x Stack (512 Gb/s*) + 1 (Control Die). This is called 2-Hi HBM.
Configuration 2: 4x Stack (1024 Gb/s*) + 1 (Control Die) This is called 4-Hi HBM.

Nvidia might even bring a configuration standard in its Pascal Architecture between these 2 'traditional' configs (3 Stacks) but that is unlikely. It could theoretically reach speeds of 2 - 4 Terabytes per second by ramping upto 16-Hi or 32-Hi HBM with multiple chips on the GPU. So here we have an interesting question. Green has promised us speeds up to 1 Terabytes per Second. So there is more or less no question that the high end GPUs will ship with high layers of stack , however what about the middle and lower order? Will they also ship with the same layers of stack or a lesser configuration. If I were to make an educated speculation I would put my money on multiple configurations scaled across the spectrum of GPUs. As in the lower order to have 2 + 1 layers, while as the top order could have the 8 + 1 layers (or more) . Continuing the same speculation, HBM utilizes a low operating frequency and low power requirement. Therefore Nvidia's Stacked DRAM will most probably operate at around 1.2V with frequency around 1Ghz. Here is a comparison chart between our traditional GDDR5 Ram and x2 and x4 stacks of DRAM with the control dies.

Wccftech	GDDR5	2-Hi HBM 'Stacked DRAM'*	4-Hi HBM 'Stacked DRAM'*
I/O	32	512	1024
Max Bandwidth Per Pin	7 Gbps	1 Gbps	1 Gbps
Max Bandwidth Per Stack	28 GBps	64 GBps	128 GBps
Voltage	1.35 - 1.65	~1.2	~1.2
Command Input	Single	Dual	Dual
Layers	1	2 + 1	4 + 1

[*Calculation for the table is given in the comments] You might have noticed that the 3D Memory to have a Dual Command input feature. The reason for this is that a single layer of 3D Memory has two RAM modules. I.e. an 8GB 4-Hi HBM RAM would be divided into 4 layers with each layer having, 1 + 1 GB configuration. This is what enables the Dual Command feature. Of course if we say increase every module to 4 GB, then on a 4-Hi HBM RAM we ca achieve a 16GB configuration and vice versa if we want to make it smaller. However in this generation, don't expect anything above the 8GB 4-Hi HBM Ram configuration. Of course we can scale it to 8-Hi HBM Ram with a maximum capacity of 32 GB but that kind of memory in desktop GPUs is unlikely.

Nvidia Pascal: NV Link - A Very High Speed Interconnect

I would be very surprised if Nvidia's '3d memory' is not utilized in AMD Next Gen too, considering that they are the ones who actually came up with HBM Ram not to mention the standard is open. However it is a slightly different story with the NV Link, which appears to be more or less proprietary. NV Link is going to come in 3 different layouts in the upcoming Pascal Architecture. The first one is irrelevant to us but I am going to touch upon it slightly anyways:

1. NV Link designed for the IBM Power CPUs
2. NV Link designed for the GPU - CPU connection via your normal PCI Express slot
3. NV Link designed for the Onboard ARM - GPU Connection of future Nvidia GPUs.

Since IBM does not have a HPC/Server GPU solution, it has decided to pursue a very promising partnership with Nvidia. Pascal Architecture's NV Link would see Nvidia, getting out of its comfort zone of x86 and making its GPUs interface with IBM's Power CPUs. On the PCI Express mode the NV Link, which is basically a super high speed serial interconnect, uses an embedded clock differential signaling technique aka differential signal. This allows it to achieve nearly 5 - 10 times the speed of a PCI Express 3.0 running in x16 Mode. The actual speeds is though to be along the lines of 80GB/s to 230GB/s.

What Nvidia is going for is basically a complete point-to-point design where the processors are connected directly to each other without going through a third party channel. However this means that the current PCI-E slot is no good. So NV Link will have to be physically included in Motherboards of the future. Rumors put NV Link to be a glorified Mezzanine connector, which will allow, bluntly put, a socketed GPU. Since Pascal already has on package memory, Nvidia's custom bus 'NV Link' with the help of a Mezzanine interface will allow never seen before speeds in GPU data transfer. Not only that but a custom Mezzanine connector will be able to supply far more than the 75W present in our PCI-E slots today, allowing GPUs to be completely powered by the NV Link. However Anandtech has raised some very valid criticism. The criticism being that NVLink is in no position to replace PCI-E anytime soon. Best case scenario being GPUs with dual NV Link - PCI-E connectivity and the server market (IBM) taking hold of NV Link completely. I won't go into much more detail on how the NV Link functions via blocks, since that has already been covered multiple times. So umm, yeah, thats all folks.

#Nvidia #Pascal Architectural Analysis. http://t.co/h5VDYiyz0l

— Usman Pirzada (@usmanpirzada) April 11, 2014

About the author: PC Hardware and Technology Enthusiast, Blood of Silicon (1 nm),

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on Nvidia Pascal Architecture Detailed Technical Analysis – Stacked DRAM and NV Link

Nvidia Pascal Architecture Detailed Technical Analysis – Stacked DRAM and NV Link

Nvidia Pascal: Using CoW (Chip-on-Wafer) based 3D Memory to Achieve the Next Gen GPU

Nvidia Pascal: NV Link - A Very High Speed Interconnect

Trending Stories

Square Enix’s Final Fantasy VII Rebirth Looks Like a Remaster on PC, as Shader Injector 2.0 Delivers Series’ Best Visuals

GameStop May Have Leaked Zelda: Ocarina of Time Remake Pre-Orders for August 4, Hinting First Real Footage Isn’t Far

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

Kimi K3 Built A Chip In Just 48 Hours, Which Pushes Over 8700 Tokens/s, As China’s Moonshot Delivers A 2.8 Trillion Parameter Frontier AI Model

AMD Ryzen 7 7700X3D Is Now Available At Newegg At $279; Retailer Bundles Various Hardware With The New Zen 4 CPU

Popular Discussions

AMD Radeon Drivers Silently Add Multi Frame Generation “MFG 8x”, Ray Regeneration, and Neural Radiance Overrides, Hinting At A Bigger FSR Push

AMD Ryzen 7 7700X3D 4.5 GHz “3D V-Cache” CPU Review: The Budget X3D Champ For AM5

NVIDIA’s GeForce RTX 5070 Ti SUPER – Specs, Performance, And Price, Everything We Know So Far

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

AMD Ryzen 7 7800X3D CPU Drops To $299 A Day Ahead of 7700X3D’s Launch, Bringing 3D V-Cache Goodness To Mainstream Gamers

Nvidia Pascal Architecture Detailed Technical Analysis – Stacked DRAM and NV Link

Nvidia Pascal: Using CoW (Chip-on-Wafer) based 3D Memory to Achieve the Next Gen GPU

Nvidia Pascal: NV Link - A Very High Speed Interconnect

Further Reading

Trending Stories

Popular Discussions