Taiwan's leading technology publication dropped quite the scoop on NVIDIA news today: they have confirmed that the company has pre-booked 5nm capacity from TSMC to use for their next-generation Hopper GPUs. According to a report we already covered, AMD's aggressive approach with 7nm caught NVIDIA by surprise, and to rectify that problem, they have already pre-booked TSMC 5nm for Hopper GPUs. While the TSMC order is confirmed, we have also heard rumors that NIVDIA is courting Samsung for their 5nm process as well, so maybe we will see them split orders.
NVIDIA prebooked TSMC's 5nm capacity for Hopper GPUs in 2021
This is huge news because with the current roadmap, it would imply that NVIDIA is spending just one generation on the 7nm process (unless Hopper gets delayed and an Ampere-refresh architecture is slotted in, which I must admit is quite possible). As per the report we already covered, AMD took NVIDIA by surprise with their 7nm move and the company is getting extremely aggressive to protect its future growth. Doubling down on TSMC's leading 5nm process is part and parcel of this strategy.
The relevant portion of the Digitimes report, via @chiakohua (and reproduced with permission):
For next-generation GPUs based on the Hopper architecture, Nvidia has already pre-booked TSMC's 5nm production capacity in 2021, and is also in discussion with Samsung for smaller volume orders...
...In order to prevent AMD from getting any bigger, Nvidia has decided to catch up, even leapfrog AMD, in adopting TSMC's 7nm and 5nm EUV nodes. (DigiTimes)
Keep in mind Hopper is just a name right now and NVIDIA could decide to call pretty much anything Hopper going forward, but we do have confirmation of the company booking 5nm capacity for its upcoming next-generation architecture. Here is the thing though, while NVIDIA has previously spent at least a few years on a single node, the landscape with AMD getting aggressive means that the company might not actually have much of a choice.
It cannot stay on the 7nm process if AMD decides to move up to 5nm once again (which they will). Doing so would diminish its brand value and make it harder to position itself as a leader in GPU technology. The solution, as it transpires, is to throw money on the problem by pre-booking 5nm capacity in advance - just so AMD can't. This is a strategy, considering NVIDIA has access to a much larger cash pile, and TSMC won't refuse money, which we expect is going to be largely successful in restoring NVIDIA's process lead on the GPU industry. If you wish to remain in the world of facts and educated speculation isn't your cup of tea, stop reading now.
Recap: Exploring NVIDIA's Hopper GPU architecture and MCM philosophy
Warning: the use of MCM in NVIDIA's Hopper architecture is not confirmed. Grain of salt and all that!
NVIDIA's architectures are always based on computer pioneers and this one appears to be no different. Nvidia's Hopper architecture is based on Grace Hopper who was one of the pioneers of computer science and one of the first programmers of Harvard Mark 1 and inventor of the first linkers. She also popularized the idea of machine-independent programming languages which led to the development of COBOL - an early high-level programming language still in use today. She enlisted in the Navy and helped the American War efforts during World War II.
An MCM-based design is arguably the next step in GPU evolution considering we are now being limited by the reticle size of most EUV scanners. Architectural improvements and MCM-design is the next logical frontier and since AMD has already executed on it on the CPU front, it makes sense that GPUs would be the next step in their grand plan - which would explain why NVIDIA would want to get a headstart on it all and beat them to the punch. The leak occurred from a well-known twitter account and the tweets have since been deleted but not before the Twitterati caught it and posted about it (over at 3DCenter.org).
AMD has already proven itself to be exceptionally good at creating MCM based products. The Threadripper and Ryzen series has been absolutely disruptive to the HEDT market space. They single-handedly turned what was usually a 6-core and very expensive affair to a 16 core affordable combo using an MCM package. The power of servers and Xeons was finally in the hands of average consumers, so why can't the same philosophy work for GPUs as well? I am sure you already know that NVIDIA can use the MCM-philosophy to beat the reticle size of scanners and build truly monstrous GPUs exceeding a net surface area of 1000mm² but are there other advantages as well?
Well, theoretically speaking, it should work better in all regards for GPUs which are parallel devices than for CPUs which are serial devices. Not only that but you are looking at massive yield gains from just shifting to an MCM based approach instead of a monolithic die. A single huge die has abysmal yields, is expensive to produce and usually has high wastage. Multiple chips totaling the same die size would offer yield increases straight of the bat. This is a great argument in favor of the NVIDIA Hopper GPU.
I took the liberty to do some rough approximations using the lovely Silicon Edge tool and was not surprised to see instant yield gains. Taking a die measuring 484mm² (eg: Vega 64) which equates to a die measuring 22mm by 22mm. Splitting this monolithic die into 4x 11mm by 11mm gives you the same net surface area (484mm²) and will also result in yield gains. How much? Let's see. According to the approximation, a 300mm wafer should be able to produce 114 monolithic dies (22x22) or 491 smaller dies (11x11). Since we need 4 smaller dies to equal 1 monolithic part, we end up with 122 484mm² MCM dies. That's a yield gain of 7.6% right there.
The yield gains are even larger for bigger chips. The upper limit of lithographic techniques (with reasonable yields) is roughly 815mm². On a single 300mm wafer, we can get about 64 of these (28.55x28.55) or 285 smaller dies (14.27x14.27). That gives us a total of 71 MCM based dies for a yield increase of roughly 11%. Now full disclosure, this is a very rough approximation and does not take into account several factors such as packaging yields, rectangular die and other shape-based optimization of the wafer etc but the basic idea holds well. Conversely, it also does not take into account increased gains by lowered wastage - a faulty 815mm² monolithic die is much more wasteful than a single 203mm² one! This means this approach has the added benefit of minimizing the impact of defective dies - which will add on to these yield numbers once you factor in unusable dies.
Long story short, NVIDIA is perfectly capable of creating an MCM based GPU and would even get some serious yield benefits out of this if it chooses to run with this for Hopper GPUs. Considering the 7nm node is now entering a mature stage with EUV, etches are going to be very clear and should be able to support a concept like this with ease but are still limited by the reticle size. Switching to an MCM based design would allow NVIDIA to build monstrous GPUs with a net die size of more than 815mm² (the sky is the limit with MCM - as AMD has proven)! So if it wants to continue on its non-linear performance increase trend that has made it so successful, it might not have any other option but to adopt this.