NVIDIA’s Next-Gen GPU Specifications And Performance Leaks Out – Massive Die With 7936 CUDA Cores (8192 Full Die), Up To 48 GB HBM2e Memory
NVIDIA's next-gen GPU is getting unveiled really soon and while GTC has been moved to an online-only event, that isn't holding the green giant back from announcing its biggest GPU to date. We got to see leaked specifications of two unreleased GPUs a few days ago but it looks like a new SKU has been spotted by Twitter fellow W_At_Ar_U, and the latest chip is a beast of its own with a total core count of almost 8K cores.
NVIDIA's Next-Gen GPUs Performance & Specifications Leaked - The Ultimate HPC Powerhouse With Up To 8K Cores & 48 GB HBM2e Memory
The NVIDIA Next-Generation GPU architecture, which is reportedly codenamed Ampere, has been known for a while. It will go on to power the company's latest Tesla GPUs which are going to be used by the top HPC and cloud datacenter organizations.
According to the Vice President of Information Technology and the Chief Information Officer at Indiana University, who will be deploying their Big Red supercomputer this summer, it was revealed that NVIDIA's next-generation GPUs offer a massive 75% performance uplift over existing Volta-based GPUs. There are also similar reports which we have heard in the past with the GPUs offering up to 50% performance increase with twice the efficiency which would be an incredible feat to pull off.
So coming to the specifications of the latest GPU which has been spotted in Geekbench, I will also be comparing it to the previously leaked parts to see what kind of performance uplift we should be expecting from all of the variants. Do note that these GPUs were tested all the way back in October and November of 2019 so they have been hiding in the Geekbench database for a few months now, but the specifications would definitely have seen big changes as these are still early samples. The other thing to note here is the lower clock speeds which point out the early designs as I have mentioned.
NVIDIA's Next-Gen GPU #1 Specifications & Performance
The first GPU to talk about is the one that was just recently spotted. This GPU features a total SM count of 124 which equals 7936 CUDA cores since NVIDIA's professional GPU architecture comes with a 64 CUDA Core design per streaming multiprocessor. This is also a 55% increase in CUDA cores over the Tesla V100's 5120 Cores. The GPU has a maximum clock speed of 1.1 GHz and at this unfinalized clock, it should deliver around 17.5 - 18 TFLOPs of FP32 horsepower.
It carries 32 GB of HBM2e memory clocking in at 1200 MHz and runs across a 4096-bit bus interface. The reason I mention HBM2e is that it is the latest standard and NVIDIA has been known to utilize the most advanced memory standards on its HPC parts at the time of its launch.
In addition to the core and memory specifications, the GPU packs a 32 MB L2 cache which is a 5.33x increase over the Volta GV100 GPU which packs an L2 cache of just 6 MB in comparison. Given the massive amount of cache, we can expect some huge performance uplifts and a huge architectural change on NVIDIA's next-generation GPU which has been years in development.
As far as the performance is concerned, the GPU scores 222377 points in the OpenCL benchmark (CUDA) on Geekbench 5. The platform is running CUDA 8.0 and it is highly likely that the GPU was not fully optimized for it at the time of testing. With that said, the specifications of this card are looking literally insane so let's get on with the other two variants.
NVIDIA's Next-Gen GPU #2 Specifications & Performance
The second GPU features a total of 118 SMs or 7552 CUDA cores. This is a 47.5% increase in CUDA cores over the Tesla V100 with its 5120 CUDA Cores packed in 80 SMs and a total of 24 MB L2 cache. This GPU is also clocked at a maximum speed of 1.10 GHz and features 24 GB of HBM2e memory running along a 3072-bit bus at 1200 MHz clock speed. At these speeds, this chip should deliver a total theoretical compute horsepower of around 16.7 TFLOPs but once again, the clock speeds definitely don't look final and it could be higher.
For some context :
GV100 : 142837 (Open CL)
Tesla V100 : 154606 (Open CL)
Titan RTX : 132804 (Open CL)
— _rogame (@_rogame) February 28, 2020
This particular GPU was tested in both OpenCL and CUDA Compute benchmarks. In the OpenCL benchmark, the chip scored 184096 points while in the CUDA benchmark, it scored 169368 points. Both the 124 and 118 SM parts were running on CUDA 8.0 which once again shows that these GPUs aren't yet fully optimized for the Geekbench 5 benchmark. There's a huge difference in score for both parts despite just a 5% difference in core count.
NVIDIA's Next-Gen GPU #3 Specifications & Performance
Lastly, we have the 108 SM or 6912 CUDA core variant which has a reported clock speed of 1.01 GHz or the slowest of all three GPUs. The GPU offers a 35% increase in CUDA core count over the Tesla V100 and apparently packs 46.8 GB of HBM2e memory. This could be an error with how the Geekbench benchmark sees the total memory and it could actually be 48 GB which makes more sense. This GPU scores 141654 points in the Geekbench 5 (CUDA) benchmark which once again, is not the final score due to the lower clock speeds.
NVIDIA Tesla Graphics Cards Comparison
|Tesla Graphics Card Name||NVIDIA Tesla M2090||NVIDIA Tesla K40||NVIDIA Telsa K80||NVIDIA Tesla P100||NVIDIA Tesla V100||NVIDIA Tesla Next-Gen #1||NVIDIA Tesla Next-Gen #2||NVIDIA Tesla Next-Gen #3|
|GPU Name||GF110||GK110||GK210 x 2||GP100||GV100||GA100?||GA100?||GA100?|
|Transistor Count||3.00 Billion||7.08 Billion||7.08 Billion||15 Billion||21.1 Billion||TBD||TBD||TBD|
|CUDA Cores||512 CCs (16 CUs)||2880 CCs (15 CUs)||2496 CCs (13 CUs) x 2||3840 CCs||5120 CCs||6912 CCs||7552 CCs||7936 CCs|
|Core Clock||Up To 650 MHz||Up To 875 MHz||Up To 875 MHz||Up To 1480 MHz||Up To 1455 MHz||1.08 GHz (Preliminary)||1.11 GHz (Preliminary)||1.11 GHz (Preliminary)|
|FP32 Compute||1.33 TFLOPs||4.29 TFLOPs||8.74 TFLOPs||10.6 TFLOPs||15.0 TFLOPs||~15 TFLOPs (Preliminary)||~17 TFLOPs (Preliminary)||~18 TFLOPs (Preliminary)|
|FP64 Compute||0.66 TFLOPs||1.43 TFLOPs||2.91 TFLOPs||5.30 TFLOPs||7.50 TFLOPs||TBD||TBD||TBD|
|VRAM Size||6 GB||12 GB||12 GB x 2||16 GB||16 GB||48 GB||24 GB||32 GB|
|VRAM Bus||384-bit||384-bit||384-bit x 2||4096-bit||4096-bit||4096-bit?||3072-bit?||4096-bit?|
|VRAM Speed||3.7 GHz||6 GHz||5 GHz||737 MHz||878 MHz||1200 MHz||1200 MHz||1200 MHz|
|Memory Bandwidth||177.6 GB/s||288 GB/s||240 GB/s||720 GB/s||900 GB/s||1.2 TB/s?||1.2 TB/s?||1.2 TB/s?|
It is interesting however that the lower-end GPU features more memory capacity which may mean two things, either NVIDIA would have lower-end GPUs with higher memory capacities for specific workloads or each GPU would have different memory configurations and the 48 GB HBM2e could be the highest memory configuration for this particular GPU SKU. The other most interesting thing you can tell from this specifications leak is that while the next-gen Tesla lineup will have various GPU SKUs, the full GPU should definitely peak at 8192 CUDA cores packed in 128 SMs.
Just like the Volta GV100 GPU, the full fat (next-gen) GPU may never be available to the public since the Tesla V100 peaked at 5120 CUDA cores (80 SMs) despite the full chip containing 5376 CCs or 84 SMs. In a previous interview, NVIDIA's CEO, Jensen Huang, had confirmed that the majority of the orders for their next-generation 7nm GPU will be handled by TSMC while a small portion will be sent to Samsung for production.
Finally, Jensen was asked about the launch timeframe of their next-generation 7nm GPU, but he simply replied that it wasn't a convenient time for them to disclose any date at the moment. We know from a recent interview with NVIDIA's CFO, Colette Kress, that they want to surprise everyone with their own 7nm GPU announcement, but they are waiting for the right time to do so.
AMD, on the other hand, is also expected to make an announcement of its Radeon Instinct Mi100 HPC accelerator based on the Arcturus GPU soon which is also reportedly packing 8192 SPs and is based on the latest 7nm GPU architecture. However, as NVIDIA has proved in the past, that they can optimize their architecture to the point where it's super-efficient and competitive against GPUs from its competitors that are based on more advanced nodes (16nm vs 12nm & 12nm vs 7nm).
Given that NVIDIA would be on process parity with AMD with its next-generation GPU and with a brand new architecture too, we can see some real disruptive performance. These are definitely some huge specifications for NVIDIA's next-generation GPUs and we can definitely expect a full-blown announcement by NVIDIA at its GTC 2020 online keynote on 22nd of March.