Next-generation HBM standards, including HBM4, HBM5, HBM6, HBM7 & HBM8, have been detailed, offering big advances each generation to address the growing data center & AI needs.
HBM Memory To Scale Up Significantly In The Coming Decade As Demand For AI & Data Center Increases: HBM4, HBM5, HBM6, HBM7 & HBM8 With Up To 64 TB/s & 24-Hi Stacks In Research Phase
In a recent presentation by KAIST (Korea Advanced Institute of Science & Technology (KAIST), and Tera (Terabyte Interconnection and Package Laboratory), the firms outlined the HBM roadmap & detailed what we can expect from the next-generation standards. The report outlines several new and upcoming HBM standards, such as HBM4, HBM5, HBM6, HBM7, and HBM8, so there's a lot to dissect here.
Starting with HBM4. This will be the go-to standard for next-gen data centers and AI GPUs that are launching in 2026. Both AMD and NVIDIA have confirmed the use of HBM for their MI400 and Rubin offerings.
The research outlet also gives us some insights into NVIDIA's roadmap, which might hold some weight considering TeraByte is responsible for handling interconnection & packaging methodologies for HBM memory.
HBM4 Memory For NVIDIA's Rubin & AMD's MI500 GPUs
First up, we are looking at NVIDIA's Rubin and Rubin Ultra, which will leverage HBM4 and HBM4e memory, respectively. In the official roadmap, NVIDIA shows Rubin with 8 HBM4 sites and Rubin Ultra with 16 HBM4 sites. There are two GPU die cross-sections for each variant, but the Ultra has the larger cross-section, forming double the compute density of the standard Rubin option.
As per the research firm, Rubin will have a GPU die size of 728 mm2 and will consume 800W of power per die. This is only for the standard Rubin offering. The interposer will measure 2194 mm2 (46.2mm x 48.5mm), and will host a total of 288 to 384 GB of VRAM capacity with 16-32 TB/s of total bandwidth. The total chip power is disclosed at 2200W, nearly double that of Blackwell B200 GPUs.
Main features of the HBM4 memory standard include:
- Data Rate: ~ 8 Gbps
- Number of I/Os: 2048 (4096)
- Total Bandwidth: 2.0 TB/s
- Number of die stacks: 12/16-Hi
- Capacity per die: 24 Gb
- Capacity per HBM: 36/48 GB
- Package Power per HBM: 75W
- Packaging Method: Microbump (MR-MUF)
- Cooling Method: Direct-To-Chip (D2C) Liquid Cooling
- Custom HBM Base Die Architecture
- NMC Processor + LPDDR in Base Die
- NVIDIA Rubin & Instinct MI400 Platforms
Interestingly, AMD's Instinct MI400, which launches next year too, takes things up a notch versus Rubin and is aiming to offer 432 GB of HBM4 capacity with memory bandwidth of up to 19.6 TB/s.
Looking at the details for HBM4, the memory aims to offer a data rate of 8 Gbps with a 2048-bit IO, 2.0 TB/s of memory bandwidth per stack, 24 Gb capacity per die for up to 36-48 GB memory capacity, and a per-stack power package of 75W. HBM4 goes with the standard direct-to-chip (D2C) Liquid cooling and employs a custom HBM-based die (HBM-LPDDR).
HBM4e takes things up a notch with a 10 Gbps data rate, 2.5 TB/s of bandwidth per stack, up to 32 Gb capacity per die for up to 48/64 GB memory capacities based on 12-Hi and 16-Hi stacks, and a per-HBM package power of up to 80W.
HBM5 Targets NVIDIA Feynman With An On-Shelf Release Scheduled For 2029
HBM5 seems to stick to the 8 Gbps data rate for the Non-e variant, but drives up the IO lanes to 4096 bits. The bandwidth also increases to 4 TB/s per stack and will come with 16-Hi stacks as the baseline. With 40 Gb DRAM dies, HBM5 will scale to 80 GB capacity per stack, and the per-stack power is expected to hit 100W.
Main features of the HBM5 memory standard include:
- Data Rate: 8 Gbps
- Number of I/Os: 4096
- Total Bandwidth: 4.0 TB/s
- Number of die stacks: 16-Hi
- Capacity per die: 40 Gb
- Capacity per HBM: 80 GB
- Package Power per HBM: 100W
- Packaging Method: Microbump (MR-MUF)
- Cooling Method: Immersion Cooling, Thermal Via (TTV), Thermal Bonding
- Dedicated decoupling capacitor chip die stack
- Custom HBM Base Die w/ 3D NMC-HBM & Stacked Cache
- LPDDR+CXL in Base Die
- NVIDIA Feynman & Instinct MI500 Platforms
NVIDIA's Feynman is expected to be the first GPU to utilize the HBM5 memory standard, and while NVIDIA has listed a 2028 release schedule, it looks like the research firm is going for a more realistic 2029 launch window of this next-gen solution based on production and supply cycles.
Feynman's numbers are also highlighted, with it being a 750 mm2 die GPU with a per-die power of 900W, and the flagship chip will be called the F400. NVIDIA hasn't shown any concrete illustration of the chip itself, but the research firm believes it to be a four-GPU die package with 8 HBM5 sites. This package is said to measure 4788 mm2 (85.22mm x 56.2mm). The entire GPU should pack 400-500 GB of HBM5 capacity and will offer a TDP of 4400W.
HBM6 For Post-Feynman GPU Architecture - Massive Power, Massive Capacities, Lots of Bandwidth
Now, NVIDIA might go bigger with a Feynman Ultra offering, but that isn't listed. What is listed is a post-Feynman design with HBM6, so let's start here. With HBM6, we are expecting to see double the data rate at 16 Gbps while using 4096-bit IO lanes.
Main features of the HBM6 memory standard include:
- Data Rate: 16 Gbps
- Number of I/Os: 4096
- Total Bandwidth: 8.0 TB/s
- Number of die stacks: 16/20-Hi
- Capacity per die: 48 Gb
- Capacity per HBM: 96/120 GB
- Package Power per HBM: 120W
- Packaging Method: Bump-less Cu-Cu Direct Bonding
- Cooling Method: Immersion Cooling
- Custom Multi-tower HBMs
- Active/Hybrid (Silicon+Glass) interposer
- Network Switch + Bridge Die
The bandwidth doubles to 8 TB/s, and we're pushing for 48 Gb capacities per DRAM die. Another big change over HBM5 is that this should be the first time we get to see HBM stacking go beyond 16-Hi to 20-Hi, increasing the memory capacities to 96-120 GB per stack with a per-stack power of 120W. Both HBM5 and HBM6 are expected to feature Immersion Cooling solutions, with the latter going for a multi-tower HBM (Active/Hybrid) interpose architecture and additional features such as onboard Network Switch, Bridge Die, and Asymmetric TSV in the research phase.
According to the research firm, a next-gen GPU with a per-GPU die size of 700 mm2 and a power of 1000W per chip is expected to use this memory type. The package will house 16 HBM6 sites in a package area measuring 6014 mm2 (102.8mm x 58.5mm) and will offer 128-256 TB/s of bandwidth with 1536-1920 GB of memory capacity per chip and a total power of 5920W. This technology has an expected arrival timeframe of 2032.
HBM7 & HBM8 - Maxing Memory Out For The Next Decade
With HBM6 being the highlight at the start of the next decade, HBM7 and HBM8 will be the big guns that take the standard to a whole new level. HBM7 will offer 24 Gbps pin speeds per stack and 8192 IO lanes, double that of HBM6. The increased data rate and IO capability will drive up the bandwidth to 24 TB/s, 3x of HBM6, and with 64 Gb capacity per DRAM die, you will see up to 160-192 GB capacity per stack, thanks to 20-24-Hi memory stacks. Each stack will have a package power of 160W.
Main features of the HBM7 memory standard include:
- Data Rate: 24 Gbps
- Number of I/Os: 8192
- Total Bandwidth: 24.0 TB/s
- Number of die stacks: 20/24-Hi
- Capacity per die: 64 Gb
- Capacity per HBM: 160/192 GB
- Package Power per HBM: 160W
- Packaging Method: Bump-Less Cu-Cu Direct Bonding
- Cooling Method: Embedded Cooling
- Hybrid HBM Architecture
- HBM-HBF
- HBM-LPDDR
- Buffer dies in the HBM stack
Die stacking for HBM6, HBM7, and HBM8 will be achieved using bump-less Cu-Cu direct bonding, and HBM7/HBM8 will go for embedded cooling solutions. HBM7 will also introduce the brand-new HBM-HBF and HBM-3D LPDDR architecture.
Next-Gen solutions based on HBM7 memory are expected to go super-big and multi-chiplet, with one package offering 8 GPU sites, with each GPU said to measure 600 mm2 and consume up to 1200W of power, while 32 HBM7 sites offer 1024 TB/s of bandwidth, making it the first Petabyte-class bandwidth solution. The chip should also pack a massive memory capacity of up to 5120-6144 GB, depending on the HBM height, and a total power of 15,360W is expected, almost 3x that of HBM6-based solutions.
Main features of the HBM8 memory standard include:
- Data Rate: 32 Gbps
- Number of I/Os: 16384
- Total Bandwidth: 64 TB/s
- Number of die stacks: 20/24-Hi
- Capacity per die: 80 Gb
- Capacity per HBM: 200/240 GB
- Package Power per HBM: 180W
- Packaging Method: Bump-Less Cu-Cu Direct Bonding
- Cooling Method: Embedded Cooling
- Coaxial TSV / Full-3D GPU-HBM
- HBM Centric Computing
- Full Memory Network
- Double-sided interposer
For HBM8, the memory standard won't arrive until 2038, so there's a long road ahead of us till we get there, but expected specifications include a 32 Gbps data rate and doubling the IO lanes to 16,384. The memory solution will offer 64 TB/s of bandwidth per stack, and with 80 Gb capacities per DRAM, we will see up to 200/240 GB memory capacities and a per-HBM-site package power of up to 180W.
HBF Architecture For Memory Intensive LLM Inference, Innovative Cooling Methods
One of the things I mentioned above is the HBF (High-Bandwidth Flash) architecture, which is designed to meet the demands of memory-intensive LLM inference. With HBF, instead of using standard memory DRAM, manufacturers utilize NAND, up to 128 layers, offering higher capacities in a 16-Hi stack, which is interconnected using an HBF TSV solution.
Each HBF stack is connected in parallel to an HBM stack and will be able to offer up to 1 TB of capacity in addition, and uses a fast 2 TB/s HBM-to-HBF interconnect, which communicates to other components on the mainboard using a 128 GB/s bidirectional interconnect running through a memory network switch.
With HBM7, this NAND-based stack is further upgraded and connected to a Glass Interposer using an interconnect that offers 4096 GB/s transfer speeds. An LPDDR solution of up to 384 GB capacity runs in conjunction with an HBM stack with 192 GB capacity.
HBM stacks will also scale up to 24-Hi designs with HBM7 and HBM8 in a twin-tower high-bandwidth package.
With the arrival of glass-based silicon interposers, the research firm highlights the use of embedded cooling as the standard approach, which will go through the interposer and offer direct-cooling to the HBM, HBF, and GPU IPs. Definitely a lot of details have been unpacked here for future generations of HBM, but we can't wait to see these in action in the coming years.
Follow Wccftech on Google to get more of our news coverage in your feeds.
