NVIDIA Working To Embed Multiple GPUs and Multiple Layers of Stacked DRAM on A Single MCM Package – Up To 256 SMs and Multiple Terabytes of Bandwidth

Hassan Mujtaba • Jul 4, 2017 at 11:12pm EDT

NVIDIA recently announced their Volta GV100 GPU, the biggest graphics chip ever to be created in history. The company reached practical limits of the latest process node when designing the mono lithic GPU which is aimed at the compute intensive market.

NVIDIA Plans To Cram Several GPUs and Several Stacked DRAM Dies on a Single Package In Future

NVIDIA currently has the two fastest GPU accelerators for the compute market, the last years Tesla P100 that is based on Pascal and this years Tesla V100 that is based on Volta. There's one thing in common about both chips, they are as big as a chip can get on their particular process node. The Pascal GP100 GPU measured at a die size of 610mm² while the Volta V100 GPU, even being based on a 12nm process from TSMC is 33.1% larger at 815mm². NVIDIA's CEO Jen-Hsun Huang revealed at GTC that this is the practical limits of what's possible with today's physics and they cannot make a chip as dense or as big as GV100 today.

NVIDIA's CEO - Jen-Hsun Huang: GTC 2017 - "The part that is really shocking is this is rectile limits. Rectile limits basically means that is at the limit of photo lithography meaning you can't make a chip any bigger."

The NVIDIA Volta GV100 GPU is the biggest FinFET GPU ever designed.

NVIDIA is one of the biggest power players in the GPU industry and they have a very tight grip over at at AI and Deep Learning market since some recent years. The launch of Volta just a year after Pascal confirms that they have a huge demand for their GPU based accelerators by corporations who hunger for compute accelerating products. But Jen-Hsun's quote from GTC 2017 may give us a hint at where they are headed after Volta. It is possible that NVIDIA is seeing the limit of process nodes as a bottleneck and while they want to offer even more performance in a short span of time, the process can only allow them to a certain limit.

This year, the Volta die size increased vastly over Pascal. There are a few reasons for that, unlike AMD, NVIDIA focuses on implementing different cores for specialized tasks. Volta and Pascal have dedicated FP32 and FP64 cores. Volta even goes ahead to house dedicated Tensor cores for INT8 operations to accelerate neural networking and deep learning performance. The more cores you add, the larger the die size gets and these can only be added to a limit. So what's the solution? MCM.

NVIDIA's MCM GPU Package Design Featured in Research Publication - A Sight at NVIDIA's Next GPU Accelerator For HPC?

So just recently, a research publication (via The Tech Report) has been posted by NVIDIA which talks about building a MCM package or otherwise known as Multi-Chip-Module package. What MCM basically is that it features several chips (GPU/CPU/Memory/Controllers) on the same chip interposer that are interconnected via fast I/O lanes.

Some examples of MCM packages are the Volta V100, Pascal P100 GPUs from NVIDIA, AMD Fiji and Vega GPUs, and even the new server aimed EPYC processors from AMD. The NVIDIA, AMD GPUs may support only one GPU but they feature multiple DRAM dies on the same package making it a MCM design. The EPYC processors house four individual dies that consist of 8 cores per die and interconnected via their Infinity Fabric link. AMD is also working on a similar approach with their Navi GPUs.

There are multiple options when designing a MCM package. NVIDIA has proposed that their MCM solution may depart the traditional design of featuring a monolithic GPU and few DRAM dies on the package in favor of a MCM design that uses multiple smaller iterations of their GPU chips with significantly more amount of DRAM dies. The GPU and DRAM dies will be connected to a I/O and controller chipset on-die rather than on-chip. The solution here is the implementation of GPMs (GPU Modules) which will be smaller, easier to produce and less expensive chips which will be inter connected.

NVIDIA simulated the performance of a 256 SMs based MCM GPUs, 64 SMs per GPU. The top Volta chip currently features 84 SMs that consist of 5376 cores. The proposed 256 SM MCM package will house 16,384 cores.

Over the base line MCM-GPU, such chip resulted in a performance increases of 22.8%. When compared to the largest theoretically possible monolithic GPU which is assumed to house 128 SMs on a single GPU, the 256 SM chip performed 4.5% better and comes within 10% of a unbuildable yet similarly sized monolithic GPU (256 SMs). There's also architectural upgrades and better interconnects to be taken in consideration. NVIDIA points out that each GPM is expected to be 40-60% smaller than today's biggest GPU assuming it's designed on a new 10nm or 7nm process node. A very basic GPU MCM diagram is illustrated below:

There's still a long way to go before we see more details on an MCM GPU from NVIDIA but I think after Volta, NVIDIA has their R&D team focused on such projects and we will hear more about the MCM designs featuring several GPMs at next year's GTC (Graphics Technology Conference) or when NVIDIA is expected to reveal their new graphics roadmap.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on NVIDIA Working To Embed Multiple GPUs and Multiple Layers of Stacked DRAM on A Single MCM Package – Up To 256 SMs and Multiple Terabytes of Bandwidth

NVIDIA Working To Embed Multiple GPUs and Multiple Layers of Stacked DRAM on A Single MCM Package – Up To 256 SMs and Multiple Terabytes of Bandwidth

NVIDIA Plans To Cram Several GPUs and Several Stacked DRAM Dies on a Single Package In Future

NVIDIA's MCM GPU Package Design Featured in Research Publication - A Sight at NVIDIA's Next GPU Accelerator For HPC?

Trending Stories

GameStop May Have Leaked Zelda: Ocarina of Time Remake Pre-Orders for August 4, Hinting First Real Footage Isn’t Far

PC Vendors Race To Lock In CXMT DRAM Supply As Memory Orders Reportedly Stretch Through 2027

Intel’s Former CEO Gelsinger Admits Firm ‘Scoffed’ at NVIDIA’s GPUs While Riding High on CPU Dominance & Makes Big Quantum Computing Claims

Square Enix’s Final Fantasy VII Rebirth Looks Like a Remaster on PC, as Shader Injector 2.0 Delivers Series’ Best Visuals

PlayStation 6 Patent Scraps Liquid Metal Cooling After PS5 Leaks Fried APUs And Motherboards For Years

Popular Discussions

AMD Radeon Drivers Silently Add Multi Frame Generation “MFG 8x”, Ray Regeneration, and Neural Radiance Overrides, Hinting At A Bigger FSR Push

AMD Ryzen 7 7700X3D 4.5 GHz “3D V-Cache” CPU Review: The Budget X3D Champ For AM5

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

AMD Ryzen 7 7800X3D CPU Drops To $299 A Day Ahead of 7700X3D’s Launch, Bringing 3D V-Cache Goodness To Mainstream Gamers

NVIDIA Working To Embed Multiple GPUs and Multiple Layers of Stacked DRAM on A Single MCM Package – Up To 256 SMs and Multiple Terabytes of Bandwidth

NVIDIA Plans To Cram Several GPUs and Several Stacked DRAM Dies on a Single Package In Future

Related Story CXMT Debuts With $8.6 Billion IPO As Its DRAM Surge Chips Away At Samsung’s Market Dominance By 2028

NVIDIA's MCM GPU Package Design Featured in Research Publication - A Sight at NVIDIA's Next GPU Accelerator For HPC?

Further Reading

Trending Stories

Popular Discussions