AMD Instinct MI350 Launched: 3nm, 185 Billion Transistors, 288 GB HBM3E Memory, FP4 & FP6 Support, MI355X 35x Faster Than MI300 & 2.2x Faster Than Blackwell B200

Hassan Mujtaba

AMD has officially launched its next-gen Instinct MI350 series, which includes the MI350X and the flagship MI350X, with up to 185 billion transistors.

AMD Instinct MI350 Powers Next-Gen AI With Brand New 3nm Process Node, 20 PFLOPs of AI Compute & New Format Support

Today, AMD has officially launched its Instinct MI350 series of HPC / AI GPUs, which come equipped with a brand new CDNA 4 architecture based on TSMC's 3nm process node.

Related Story Deals So Good, You’ll Regret Missing These: RX 9070 XT Now At $599 And RX 9060 XT 16 GB At Just $339

The chip itself features 185 billion transistors and comes in two flavors, the MI350X and the faster MI355X, offered in both air and liquid-cooled configurations. The new chips support the latest FP6 and FP4 AI data types and are equipped with massive HBM3e memory capacities. For comparison, NVIDIA's B300 chips based on the 4nm process node from TSMC offer up to 208 billion transistors.

The MI350 series chips pack a total of 256 compute units with 128 stream processors for a total of 16,384 cores. These are lower cores than the MI325 and MI300 series, which came packed with 304 compute units and a max core count of 19,456. These compute units are adjusted into eight zones, each with its own XCD, with each XCD packing 32 compute units. The XCDs are based on TSMC's N3P & the dual IO dies are based on TSMC's N6 node. The IOD includes 128 HBM3E channels, the infinity cache, and 4th Gen Infinity Fabric Links.

Just talking about the AI compute uplift, AMD claims that the Instinct MI350 series offers 20 PFLOPs of FP4/FP6 compute, which is a 4x gen-on-gen performance uplift. With HBM3e, you get faster data transfer speeds with a super-high capacity of 288 GB on both variants. There's also 256 MB of new Infinity Cache on the chips.

The memory sits in 8 stacks, with each stack packing 36 GB of memory capacity in 12-Hi stacks. The chips are also equipped with UBB8, which is a new Rapid AI infrastructure deployment standard, allowing faster deployment of air and liquid-cooled nodes.

Coming to the competitive metrics shared by AMD for its MI355X, the chip offers 8 TB/s of aggregate memory bandwidth, 79 TFLOPs of FP64, 5 PFLOPs of FP16, 10 PFLOPs of FP8, and 20 PFLOPs of FP6/FP4 compute. These numbers are for the flagship 1400W configuration of the Instinct MI355X chip. A thing to note is that both the MI350X and MI355X utilize the same die, but the 355X comes with a higher TDP rating.

The following are the numbers compared against the competition:

MI355x vs B200:

  • Memory: 1.6x Higher
  • Bandwidth: 1.0x Higher
  • FP64: 2.1x Higher
  • FP16: 1.1x Higher
  • FP8: 1.1x Higher
  • FP6: 2.2x Higher
  • FP4: 1.1x Higher

MI355x vs GB200:

  • Memory: 1.6x Higher
  • Bandwidth: 1.0x Higher
  • FP64: 2.0x Higher
  • FP16: 1.0x Higher
  • FP8: 1.0x Higher
  • FP6: 2.0x Higher
  • FP4: 1.0x Higher

But how does Instinct MI355X compare to the last-gen MI300 series? Well, AMD just showed a massive 35x leap in Inference performance using Llama 3.1 405B (Throughput), and that's a huge increase.

For the full MI350 series platform, the new Instinct ecosystem will offer up to 8x MI355 series GPUs with 2.3 TB of HBM3e memory, 64 TB/s of total bandwidth, 0.63 PFLOPs of FP64, 81 PFLOPs of FP8 & 161 PFLOPs of FP6/FP4 compute performance.

A full rack with liquid cooling will house 128-96 Instinct MI350 series GPUs with up to 36 TB of HBM3e memory, 2.6 Exaflops of FP4 compute, 1.3 Exaflops of FP8 compute, and will utilize the company's Turin EPYC CPUs based on the Zen 5 core architecture alongside the Pollara 400 interconnect solution.

With the official metrics out of the way, we can talk about the actual performance metrics in a range of AI tests presented by AMD. Once again, we start with the MI355X vs MI300X performance comparisons, and the new chips offer anywhere from a 2.8x to 4.2x increase in AI:

There's also another metric that compares the AMD Instinct MI355X with various popular AI workloads, such as DeepSeek R1, Llama 4, and Llama 3.1, and the new chips simply decimate the MI300X series:

The Instinct MI355X is also compared to the B200 and the GB200 servers from the competition and shows a 1.2-1.3x increase. In Llama 3.1 405B in FP4 mode, the new Instinct AI chips offer the same performance as the much more expensive Blackwell GB200 server from NVIDIA, which adds to AMD's pref/$ goals.

2025-06-12_22-20-05
2025-06-12_22-01-38
2025-06-12_22-20-39

AMD also showed how the Instinct MI350 series GPUs can generate up to 40% more tokens/% compared to NVIDIA's B200 solution.

AMD also confirmed that while the Instinct MI350 series launches today with availability through various partners starting in Q3 2025, the next-generation MI400 series is already in the works and is planned for launch in 2026.

 

AMD Instinct AI Accelerators:

Accelerator NameAMD Instinct MI500AMD Instinct MI400AMD Instinct MI350XAMD Instinct MI325XAMD Instinct MI300XAMD Instinct MI250X
GPU ArchitectureCDNA Next / UDNACDNA 5CDNA 4Aqua Vanjaram (CDNA 3)Aqua Vanjaram (CDNA 3)Aldebaran (CDNA 2)
GPU Process NodeTBDTBD3nm5nm+6nm5nm+6nm6nm
XCDs (Chiplets)TBD8 (MCM)8 (MCM)8 (MCM)8 (MCM)2 (MCM)
1 (Per Die)
GPU CoresTBDTBD16,38419,45619,45614,080
GPU Clock Speed (Max)TBDTBD2400 MHz2100 MHz2100 MHz1700 MHz
INT8 ComputeTBDTBD5200 TOPS2614 TOPS2614 TOPS383 TOPs
FP6/FP4 MatrixTBD40 PFLOPs20 PFLOPsN/AN/AN/A
FP8 MatrixTBD20 PFLOPs5 PFLOPs2.6 PFLOPs2.6 PFLOPsN/A
FP16 MatrixTBD10 PFLOPs2.5 PFLOPs1.3 PFLOPs1.3 PFLOPs383 TFLOPs
FP32 VectorTBDTBD157.3 TFLOPs163.4 TFLOPs163.4 TFLOPs95.7 TFLOPs
FP64 VectorTBDTBD78.6 TFLOPs81.7 TFLOPs81.7 TFLOPs47.9 TFLOPs
VRAMTBD432 GB HBM4288 GB HBM3e256 GB HBM3e192 GB HBM3128 GB HBM2e
Infinity CacheTBDTBD256 MB256 MB256 MBN/A
Memory ClockTBD19.6 TB/s8.0 Gbps5.9 Gbps5.2 Gbps3.2 Gbps
Memory BusTBDTBD8192-bit8192-bit8192-bit8192-bit
Memory BandwidthTBDTBD8 TB/s6.0 TB/s5.3 TB/s3.2 TB/s
Form FactorTBDTBDOAMOAMOAMOAM
CoolingTBDTBDPassive / LiquidPassive CoolingPassive CoolingPassive Cooling
TDP (Max)TBDTBD1400W (355X)1000W750W560W

Follow Wccftech on Google to get more of our news coverage in your feeds.

Button