⋮    ⋮  

Is AMD’s Fiji XT Limited to 4GB of Memory ? Let’s Find Out


A recent article alleged that AMD's Fiji XT will be limited to 4GB of memory due to the size of the interposer used for the Fiji XT GPU. In this editorial piece I'll be investigating this claim to see if it holds any validity.

HBM Memory
First of all let's take a brief overview of Fiji XT's memory system that's been leaked a while back. The leak suggests that the Fiji XT powered card will use four High Bandwidth Memory modules. High Bandwidth Memory or HBM for short is a new wide I/O JDEC high performance memory spec co-developed by SK Hynix and AMD.

Is AMD's RFiji XT Limited to 4GB of Memory ? Lets Find Out

The first generation of the memory standard will allow for a 4-Hi stack. Each of the four stacked memory dies  has a capacity of 256 Megabyte / 2 Gigabit, amounting to a total of 1 Gigabyte per stack. This 1GB memory module runs at a 1Ghz clock speed but is capable of 128GB/S bandwidth. This is achieved by using Through Silicon Vias or TSVs for short. Which simply put allow for an extremely wide memory interface resulting in a very high amount of bandwidth.

HBMSo even though the memory only runs at a clock speed of 1Ghz / 1Gbps which is significantly lower than what GDDR5 is capable of today. HBM still delivers significantly more bandwidth, up to 9 times more. This is because while each GDDR5 memory module only requires 32bit wide interface, each HBM module requires a 1024bit wide interface which is used to transfer data.

Hynix-HBM-15HBM can be packaged with another processor in a 3D fashion or a 2.5D fashion. In 3D stacking you'd stack the memory directly on-top of the logic chip (CPU/GPU/SOC) that you want to feed. In a 2.5D fashion the memory is put on-top an interposer which it shares with the logic chip. So they sit side by side essentially. Due to the issues of heat and required capacity that accompanies high performance chips, 2.5D stacking is preferred. While 3D stacking is ideal for low power applications where the thermal output is minimal and the smaller footprint is advantageous.
amd-radeon-3d-hbm-vs-25d-high-bandwidth-memoryBack to the original question of is Fiji XT limited to 4GB of memory ? The original claim states that AMD can't fit 8GB of HBM on an interposer along with a Fiji XT GPU because the die is too big and there won't be enough space to fit additional memory modules. Not only that but the article actually claims that Pascal will be able to accommodate more than 4GB of memory, because supposedly Pascal will use 3D stacking rather than 2.5D stacking.

Both claims however are wrong. The Pascal test vehicle that Nvidia showcased back in GTC of last year, clearly shows Pascal paired with HBM in a 2.5D package. The memory sits on the interposer next to the Pascal GPU instead of being stacked on-top of it. So Pascal will utilize 2.5D stacking rather than 3D stacking.
Nvidia PascalAnd as I mentioned above, interposer / 2.5D stacking is the ideal and perhaps only way to pair a high performance chip such as a high-end GPU with 3D memory.  This is because it allows for higher capacities and better thermal management.

AMD's prototypes show the same thing, HBM would be stacked on the same interposer as the processor. Which brings us to the other false claim which states that more than 4GB of HBM won't fit on an interposer with a large chip such as FIji XT.  Bryan Black, AMD's head of the die stacking program gave a talk titled "Die Stacking and The System" in hotchips of 2012. In an answer to a question Black stated that engineers can use interposers of any size to meet their needs. They can be as large as 1600mm² or even 2000mm² to fit as many components as needed. So the article's claim that interposers can only be as large as the maximum reticle size is also inaccurate.

AMD's own illustrations show eight individual HBM cubes stacked on the same interposer as the processor. In Nvidia's Pascal test vehicle we can also see that there's still room left even with a large GPU to add at least four more memory cubes. So the area of the interposer is certainly not a technical limitation.

AMD APUs Carrizo APU Stacked MemoryThe technical limitation would have to lie somewhere else. As we've discussed above, each 1GB cube of HBM delivers 128GB/S of bandwidth through a 1024bit wide interface. So AMD would need to design Fiji XT with a 8192bit memory interface to leverage the bandwidth of 8 HBM cubes. The issue here believe it or not is that 8GB of HBM equate to 1 Terabyte of bandwidth. And for a GPU like Fiji XT with 4096 stream processors half of that bandwidth would go to waste. So from a logical standpoint it would make no sense for AMD to design a GPU that can't utilize more than half of the available bandwidth.

The leaked specifications suggest that Fiji XT actually has a more reasonable 4096bit memory interface and 4GB of HBM running at 1.25Gbps. Which would deliver 640GB/S of bandwidth, more than enough to feed a 4096SP GPU.  There are however two viable methods which AMD can use to double the memory for Fiji XT from 4GB to 8GB. The first would be to simply double the number of HBM cubes from 4 to 8. If Fiji XT does indeed have a 4096bit interface this would ensure a doubling in capacity at a constant 640GB/s bandwidth because every couple of HBM cubes would have to share one 1024bit wide memory slice.
The other method by which AMD can double the memory capacity would be to use HBM cubes with double the density, i.e. 2GB cubes instead of 1GB cubes. Currently, first generation HBM only comes in 4-Hi 1GB cubes. However Hynix plans to introduce new iterations with larger capacities per die as well as more stacked dies per cube. Both of those solutions would enable AMD to introduce GPUs with more VRAM, even up to 16GB, while maintaining that same 4096bit interface at 640GB/S.

R9 290X PCBHowever vendors usually tend to introduce cards with more memory not by increasing the number of memory chips, but by using memory chips with larger capacities. Because this helps the vendor avoid any alterations to the PCB layout.  The 8GB R9 290X cards for example come with 16 512MB GDDR5 memory chips. While the regular 4GB versions are equipped with 16 256MB GDDR5 chips.

There are a few exceptions to this such as with the 6GB GTX Titan compared to the GTX 780 Ti or 780. The GTX Titan and Titan Black come with double the amount of available VRAM compared to the GTX 780 and 780 Ti. And Nvidia chose to achieve that by doubling the number of GDDR5 memory chips from 12 to 24. Instead of maintaining the number at 12 and using higher capacity GGDR5 chips.

But for Fiji XT we're betting on the former, as in when SK Hynix rolls out 2GB cubes AMD will be able to make 8GB versions of Fiji without much trouble.