In a new research paper, AMD is exploring ways to stack L2 cache on its future chips, offering similar or better latency.
3D V-Cache But For L2: AMD Exploring Integrating Stacked L2 Caches Besides L3 For Future Chips
AMD has published an interesting research paper titled "Balanced Latency Stacked Cache" with a patent application number of "US20260003794A1". In this paper, AMD discloses techniques for a balanced latency stacked cache, where a stacked cache system includes a first cache die and at least a second cache die in a stacked orientation with the first cache die.
We know that AMD already offers stacked cache in the form of 3D V-Cache, which employs an additional L3 cache layer, either on top or underneath its core compute chiplets. The first generation of 3D V-Cache was stacked on top of Zen compute chiplets, while the second generation saw the placement of the stack below the compute chiplet. These approaches are largely similar in the premises, as they both use a stack cache layer.
AMD's 3D V-Cache or X3D solution has been used on chips ranging from the client "Ryzen" series to the top-tier datacenter powerhouses such as the "EPYC" lineup. While AMD continues to develop its L3 3D V-Cache technologies, the company is exploring more ways to stack further caches. The patent points to L2 stacks being the red team's next venture.
For its stacked L2 cache design, AMD uses an illustrative example, showcasing a base die that is attached to a compute die and a cache die, and then a further compute and cache die is added on top of it. This example uses a cache module with four 512 KB regions for a total of 2 MB L2 cache, and a CCC or Cache Control Circuitry. This L2 cache complex can be expanded as needed with up to 4 MB showcases in the block diagram.
The stacking approach utilizes the same 3D V-Cache principle of attaching the L2 / L3 stacks to the base die and the compute complexes using Silicon Vias, configured vertically in the center of the stacked cache system, which comprises a first cache die and a second cache die. The CCC controls data inputs and outputs.
In the paper, AMD uses a planar 1 MB L2 and 2 MB L2 cache configurations as an example. It states that a 1 MB L2M cache has a typical latency of 14 cycles on a planar configuration, while a stacked 1 MB L2M has a latency of 12 cycles. This shows that stacked L2 cache can not only offer higher capacities, but can also achieve similar or better cycle latency than typical planar approaches.
In aspects of the described techniques, the configuration of the stacked cache system reduces response latency when accessing the stacked cache, and also provides a power savings feature. The stacked cache system improves data transfer performance, and has a lower latency than a conventional planar cache built on a single die. Notably, the connection vias are routed into and out of the center of the stacked cache system. This avoids adding wire stages (also referred to herein as pipe stages), as in a conventional planar cache, to route data over one part of the cache to reach a portion of the cache that is further away from the data I/Os.
In the described techniques, the connection vias that are routed center of the stacked cache system create balanced (or identical) latencies between the two halves of the stacked cache system on the stacked die (e.g., of the first cache die and the at least second cache die). For example, a conventional planar 1 MB L2M cache has a 14 cycle latency, while a stacked 1 MB L2M cache implemented using the described techniques has only a 12 cycle latency. This provides for implementation of a larger stacked cache than a typical planar cache, yet achieves the same or better cycle latency.
Accordingly, the described aspects of balanced latency stacked cache provides lower latency for an access request, and data is returned from the data cache faster. There is also a power savings due to an access request being accomplished in fewer cycles, so an L2 cache for example, is not turned on for as long, as well as a power savings when transitioning sooner from an active state to an idle state of the cache. Additionally, wire lengths in the cache die are shorter, which effectively results in less capacitance and also conserves power. There is also less signal loading because the signals are only traveling half the distance for an access request, and the data return. Further, less heat is being generated as a result of the power savings, less capacitance, and signals traveling less distance.
And it's not just better latency, AMD also discloses that the stacked L2 cache provides power savings too. It will be a while before we get to see stacked L2 caches in action on actual chips, but like stacked L3 3D V-Cache, there's good reason to believe that we will see it integrated in future chips from AMD, whether they be CPUs or GPUs, too. That remains to be seen.
News Source: Kepler_L2
Follow Wccftech on Google to get more of our news coverage in your feeds.
