AMD's Radeon Instinct Arcturus GPU which will feature the CDNA architecture and aim the server market has been spotted by Rogame. Featured inside the next-generation Radeon Instinct graphics cards, the CDNA architecture will leverage its compute-optimized GPU design to deliver the highest performance Compute capabilities for data centers.
AMD's CDNA Architecture Based Arcturus GPU Test Board Leaks Out - Next-Gen Radeon Instinct With 120 CUs For A Total of 7680 Cores
The AMD Arcturus GPU leaked out all the way back in 2018 which was before AMD has introduced any 7nm GPU. The Radeon VII and Navi lineup launched in 2019 and featured 7nm GPUs with Navi being aimed at the mass consumer market.
It was later revealed that AMD's next-generation HPC & AI GPUs would be designed separately from the consumer-end chips. This meant that the Arcturus GPU would be kept exclusive to the datacenter market. AMD just recently confirmed in its Radeon CDNA architecture roadmap that all CDNA based GPUs would be exclusively designed for the HPC & data center markets while Radeon RDNA GPUs will power the consumer segment.
Coming to the specifications, it was previously unveiled that AMD's Arcturus GPU would feature an increased cache and double the CUs as Vega. That along with a list of data center specific features such as XDLOPs, Rapid Packed Math, New Vector ALU & BFloat16 are to be expected in the Radeon Instinct cards that feature the new CDNA architecture. The previous Radeon Instinct MI100 proto-type 'D34303' board featured the Arcturus-XL die with a rated TDP of 200W and 32 GB HBM2 VRAM clocked at around 1000-1200 MHz.
The information for this part is based on a prototype so it is likely that final specifications would not be the same but here are the key points:
- Based on Arcturus XL GPU
- Test Board has a TDP of 200W
- Up To 32 GB HBM2 Memory
- HBM2 Memory Clocks Reported Between 1000-1200 MHz
Once again, a test board has been spotted by Rogame which is based on the Arcturus CDNA GPU and from the looks of it, this variant offers 120 CUs for a total of 7680 stream processors & a GPU clock speed of 878 MHz (750 MHz SOC clock). This variant also features an undefined amount of HBM2 memory clocked at 1200 MHz so if we are looking at a 4096-bit bus, we should get around 1.2 TB/s bandwidth which is what Aquabolt is able to offer. But it is very likely that both NVIDIA & AMD would end up utilizing the faster HBM2E 'Flashbolt' standard which goes into production this year and will be capable of delivering up to 1.8 TB/s bandwidth.
Talking about the clock speeds, the 878 MHz for the test board are rather slow as we have seen variants going up to 1334 MHz in the past. At the mentioned speeds, the chip would boast around 13.5 TFLOPs of FP32 compute power which is lower than the Radeon Instinct MI60 and also the 21 TFLOPs that we got on the previous prototype sample. It is likely that the first iteration of CDNA GPUs would end up somewhere around 25 TFLOPs FP32.
I had to forget one important detail 😅
Arcturus (Test board)
> 878MHz Core clock
> 750Mhz SOC clock
> 1200MHz Memory clock
— _rogame (@_rogame) April 21, 2020
Based on leaks of Ampere GPUs which are also expected to be announced later this year, it looks like NVIDIA might hold the upper hand in terms of Compute performance as they are speculated to hit almost 36 TFLOPs of FP32 and 18 TFLOPs of FP64 Compute power with next-generation Tesla 7nm GPU lineup.
AMD Radeon Instinct Accelerators 2020
|Accelerator Name||AMD Instinct MI300||AMD Instinct MI250X||AMD Instinct MI250||AMD Instinct MI210||AMD Instinct MI100||AMD Radeon Instinct MI60||AMD Radeon Instinct MI50||AMD Radeon Instinct MI25||AMD Radeon Instinct MI8||AMD Radeon Instinct MI6|
|CPU Architecture||Zen 4 (Exascale APU)||N/A||N/A||N/A||N/A||N/A||N/A||N/A||N/A||N/A|
|GPU Architecture||TBA (CDNA 3)||Aldebaran (CDNA 2)||Aldebaran (CDNA 2)||Aldebaran (CDNA 2)||Arcturus (CDNA 1)||Vega 20||Vega 20||Vega 10||Fiji XT||Polaris 10|
|GPU Process Node||5nm+6nm||6nm||6nm||6nm||7nm FinFET||7nm FinFET||7nm FinFET||14nm FinFET||28nm||14nm FinFET|
|GPU Chiplets||4 (MCM / 3D Stacked)|
1 (Per Die)
1 (Per Die)
1 (Per Die)
1 (Per Die)
|1 (Monolithic)||1 (Monolithic)||1 (Monolithic)||1 (Monolithic)||1 (Monolithic)||1 (Monolithic)|
|GPU Clock Speed||TBA||1700 MHz||1700 MHz||1700 MHz||1500 MHz||1800 MHz||1725 MHz||1500 MHz||1000 MHz||1237 MHz|
|FP16 Compute||TBA||383 TOPs||362 TOPs||181 TOPs||185 TFLOPs||29.5 TFLOPs||26.5 TFLOPs||24.6 TFLOPs||8.2 TFLOPs||5.7 TFLOPs|
|FP32 Compute||TBA||95.7 TFLOPs||90.5 TFLOPs||45.3 TFLOPs||23.1 TFLOPs||14.7 TFLOPs||13.3 TFLOPs||12.3 TFLOPs||8.2 TFLOPs||5.7 TFLOPs|
|FP64 Compute||TBA||47.9 TFLOPs||45.3 TFLOPs||22.6 TFLOPs||11.5 TFLOPs||7.4 TFLOPs||6.6 TFLOPs||768 GFLOPs||512 GFLOPs||384 GFLOPs|
|VRAM||192 GB HBM3?||128 GB HBM2e||128 GB HBM2e||64 GB HBM2e||32 GB HBM2||32 GB HBM2||16 GB HBM2||16 GB HBM2||4 GB HBM1||16 GB GDDR5|
|Memory Clock||TBA||3.2 Gbps||3.2 Gbps||3.2 Gbps||1200 MHz||1000 MHz||1000 MHz||945 MHz||500 MHz||1750 MHz|
|Memory Bus||8192-bit||8192-bit||8192-bit||4096-bit||4096-bit bus||4096-bit bus||4096-bit bus||2048-bit bus||4096-bit bus||256-bit bus|
|Memory Bandwidth||TBA||3.2 TB/s||3.2 TB/s||1.6 TB/s||1.23 TB/s||1 TB/s||1 TB/s||484 GB/s||512 GB/s||224 GB/s|
|Form Factor||OAM||OAM||OAM||Dual Slot Card||Dual Slot, Full Length||Dual Slot, Full Length||Dual Slot, Full Length||Dual Slot, Full Length||Dual Slot, Half Length||Single Slot, Full Length|
|Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling|
So far, AMD has unveiled that the main focus of CDNA would be performance, efficiency, features, scalability in the data center market. Currently, AMD's GCN architecture has served this segment but with CDNA, AMD will be creating GPUs specifically optimized for high-performance Compute, Machine Learning, and HPC. The 1st Gen CDNA GPUs will feature a 2nd Gen Infinity Architecture and would utilize the ROCm (Radeon Open Compute Platform) to power the data center with key optimizations and enhanced scalability. The 2nd Gen Infinity Architecture will allow for 4-8 Way GPU connectivity in a singular node, allowing the new Radeon Instinct boards to run in harmony.
AMD has proved that they can offer more FLOPs at a competitive price so maybe that is where Arcturus would be targetting. There's no word on when Arcturus would land, but AMD has hinted at a Radeon Instinct product later this year which will feature its 1st Gen CDNA architecture.