AMD has launched its latest Instinct MI325X AI GPU accelerator which comes packed with 256 GB HBM3e memory while next year's MI355X gets 288 GB.
AMD Goes All Out With HBM3e Memory Capacities: 256 GB on MI325X "CDNA 3" This Year & 288 GB on MI355X "CDNA 4" Next Year
As part of today's "Advancing AI" event, AMD is rolling out its brand new Instinct MI325X AI GPU Accelerator which improves upon the MI300X with brand new capabilities.
But before we get into the details, we have to talk about AMD's Instinct platform as a whole which has garnered support from the world's top AI companies and is being used by some of the biggest brands such as Meta, OpenAI and Microsoft.
AMD's commitment towards performance leadership, easy migration, an open ecosystem, and customer-focused portfolio has led to huge support from leading OEMs and cloud partners, and as such, the company has fast-tracked the launch of its next-solution as the AI demands in the industry grow to unparalleled heights.
AMD MI325X With 256 GB Memory & CDNA 3 Architecture
Currently, AMD's MI300X is said to offer up to 30% higher performance across a range of AI-specific workloads against the NVIDIA H100. AMD's added work to their ROCm suite is helping extract more performance out of the flagship accelerator but now's the time to build even better hardware with the same robust software support.
Meet the AMD Instinct MI325X, this brand-new accelerator is built upon the same fundamental design and architecture as the MI300X. Using the CDNA 3 GPU architecture, the MI325X can be seen as a mid-cycle upgrade, offering 256 GB of HBM3e memory made using 16-Hi stacks with up to 6 TB/s of memory bandwidth, 2.6 PFLOPs of FP8, 1.3 PFLOPs of FP16 performance, all packed within a chip with 153 Billion transistors.
AMD expects the first production of Instinct MI325X AI GPUs starting in Q4 2024 along with the availability of respective server solutions starting in Q1 2025 through leading partners. The AI Instinct servers will be featuring up to 8 MI325X configurations with up to 2 TB of HBM3e memory, 896 GB/s of infinity fabric bandwidth, 48 TB/s of memory bandwidth, 20.8 PFLOPs of FP8 and 10.4 PFLOPs of FP16 performance. Each GPU is also configured at 1000W which is a big uptick over the 750-700W configurations of the MI300X.
Drilling down the numbers, AMD claims that the Instinct MI325X AI GPU accelerator should be 40% faster than the NVIDIA H200 in Mixtral 8x7B, 30% faster in Mistral 7B, and 20% faster in Meta Llama 3.1 70B LLMs. An 8x MI325X platform will also offer 40% faster performance versus an H200 HGX AI platform in Llama 3.1 405B and 20% faster in the 70B inference test. In terms of AI training, MI325X will offer similar or 10% better performance than the H200 platforms.
AMD MI355X With 288 GB Memory & CDNA 4 Architecture
Next year, AMD plans to launch a brand new Instinct MI355X GPU accelerator which will target AI workloads and this will be built using a 3nm process node. The GPU will incorporate the CDNA 4 architecture. In terms of specs, the memory will be upgraded to even higher capacities, up to 288 GB HBM3e while offering support for FP4/FP6 Data types.
AMD says that the CDNA 4 architecture delivers a 35x performance leap over CDNA 3 plus a 7x increase in AI compute, 50% increase in memory capacity/bandwidth, and also comes with the latest networking efficiency advancements.
In terms of performance, the AMD Instinct MI355X AI GPU will offer up to 2.3 PFLOPs of FP16 performance, an 80% increase over the MI325X while the FP8 figures also see an 80% increase to 4.6 PFLOPs versus the MI325X. The new FP6 and FP4 compute performance is rated at 9.2 PFLOPs.
The MI355X will mark a 50% increase in both memory capacities and memory bandwidth, with up to 8 TB/s speeds over the current-gen MI300X. The first platforms featuring eight of these MI355X GPUs will be available in the second half of 2025 and offer up to 2.3 TB of HBM3E memory capacity with 64 TB/s bandwidth, 18.5 PFLOPs of FP16, 37 PFLOPs of FP8, & 74 PFLOPs of FP6/FP4 compute.
ROCm 6.2 Continues To Dial Up AI Performance For Instinct
Moving back to the software front, AMD is announcing its latest ROCm 6.2 ecosystem which brings an average performance improvement of 2.4x and up to 2.8x across a range of AI workloads within Inferencing and an average 2.4x improvement in Training performance.
Lastly, AMD is still confirming its Instinct MI400 which was released in 2026 as a "CDNA Next" part and not using the recently disclosed UDNA architecture name. Maybe it's a bit too early to go with the UDNA naming since it hasn't been made official by AMD despite one of their top representatives confirming it so we will see how that goes in the future.
With that said, AMD looks to be going all in on the AI craze with the future Instinct offerings, bringing heated competition against the likes of NVIDIA and also tackling Intel who have been struggling to catch up with the rest.
AMD Instinct MI325X AI GPU Accelerator Gallery:
AMD Instinct AI Accelerators:
| Accelerator Name | AMD Instinct MI500 | AMD Instinct MI400 | AMD Instinct MI350X | AMD Instinct MI325X | AMD Instinct MI300X | AMD Instinct MI250X |
|---|---|---|---|---|---|---|
| GPU Architecture | CDNA 6 | CDNA 5 | CDNA 4 | Aqua Vanjaram (CDNA 3) | Aqua Vanjaram (CDNA 3) | Aldebaran (CDNA 2) |
| GPU Process Node | 2nm | 2nm+3nm | 3nm | 5nm+6nm | 5nm+6nm | 6nm |
| XCDs (Chiplets) | TBD | 8 (MCM) | 8 (MCM) | 8 (MCM) | 8 (MCM) | 2 (MCM) 1 (Per Die) |
| GPU Cores | TBD | TBD | 16,384 | 19,456 | 19,456 | 14,080 |
| GPU Clock Speed (Max) | TBD | TBD | 2400 MHz | 2100 MHz | 2100 MHz | 1700 MHz |
| INT8 Compute | TBD | TBD | 5200 TOPS | 2614 TOPS | 2614 TOPS | 383 TOPs |
| FP6/FP4 Matrix | TBD | 40 PFLOPs | 20 PFLOPs | N/A | N/A | N/A |
| FP8 Matrix | TBD | 20 PFLOPs | 5 PFLOPs | 2.6 PFLOPs | 2.6 PFLOPs | N/A |
| FP16 Matrix | TBD | 10 PFLOPs | 2.5 PFLOPs | 1.3 PFLOPs | 1.3 PFLOPs | 383 TFLOPs |
| FP32 Vector | TBD | TBD | 157.3 TFLOPs | 163.4 TFLOPs | 163.4 TFLOPs | 95.7 TFLOPs |
| FP64 Vector | TBD | TBD | 78.6 TFLOPs | 81.7 TFLOPs | 81.7 TFLOPs | 47.9 TFLOPs |
| VRAM | HBM4E | 432 GB HBM4 | 288 GB HBM3e | 256 GB HBM3e | 192 GB HBM3 | 128 GB HBM2e |
| Infinity Cache | TBD | TBD | 256 MB | 256 MB | 256 MB | N/A |
| Memory Clock | TBD | 19.6 TB/s | 8.0 Gbps | 5.9 Gbps | 5.2 Gbps | 3.2 Gbps |
| Memory Bus | TBD | TBD | 8192-bit | 8192-bit | 8192-bit | 8192-bit |
| Memory Bandwidth | TBD | TBD | 8 TB/s | 6.0 TB/s | 5.3 TB/s | 3.2 TB/s |
| Form Factor | TBD | TBD | OAM | OAM | OAM | OAM |
| Cooling | TBD | Passive / Liquid | Passive / Liquid | Passive Cooling | Passive Cooling | Passive Cooling |
| TDP (Max) | TBD | TBD | 1400W (355X) | 1000W | 750W | 560W |
Follow Wccftech on Google to get more of our news coverage in your feeds.
