I have got something pretty interesting for today's writeup. Infact I don't think anyone has attempted to quantify this particular aspect of processors before, so we will be treading on largely uncharted territory. As with most of the unorthodox hardware content we publish, this one was sourced from DG Lee, someone that pretty much everyone in the pc hardware community knows by now.
Credit @Parkoz Hardware
Level 3 cache on modern Intel and AMD CPUs boosts gaming performance by upto ~10%
Before we begin I think a general recap on caches is in order. Those who want to get to the benchmarks directly can skip the first three paragraphs. Caches are probably one of the most underrated instances of memory in a computer system. A potential gamer looking to build a rig would inquire about cores, ram, gpu, even architecture but very rarely about cache. Just because it is usually not given the spot light and is condemned to the life of fine print, does not make it is any less important than the actual cores themselves. A modern commercial processor has 3 cache levels basically.
Cache level 1, Cache level 2 and Cache level 3 (there is an L4 cache too but lets not get into that just now). The short forms of these (as you will undoubtedly know) is L1, L2 and L3 caches. However, while L1 and L2 caches are dedicated per core and are somewhat closed off in nature, an L3 cache is the general pool of memory that all cores share. Every core inside the modern multi-core processor has its own L1 and L2 cache but there is only one L3 per (entire) die. In terms of speed, you are looking at an ascending order and conventionally L1 is the fastest with L2 slower and so on. However, in recent times, the sped difference between the levels has closed, as the Industry shifts to a more unified-style architecture. In some cases, the L3 cache can even be utilized by an integrated GPU (case and point: Intel). An illustration of Haswell's die layout is attached below:
The question then arises that why don't we simply use a big enough L1 Cache for all cores in the first place? or a fast enough L3 cache only for all cores?. The answer to that question lies in the delicate balance that the cache levels implement, the more tech savvy of our readers would realize that I am of course talking about cache-latency and hit rate tradeoffs. If you create a very large L1 cache, then firstly, you would be wasting precious die space since very few applications need those kind of speeds and secondly the size itself will result in a lowered hit rate. The L3 cache is one example of this where specialized algorithms make sure that cores use the portion of the L3 closest to them to optimize performance. This is why modern processors implement a very small but very fast L1 cache, a slightly bigger but slower L2 cache and a big but slow L3 cache. Some processors now include eDRAM which is basically an L4 Cache and of an even larger size. Anyways, enough of that, lets get down to the nitty gritties of the benchmarks themselves.
[The slides are courtesy of DG Lee]. As you can see, going up from "2MB L3" to "8MB L3" results in an almost 10% boost depending on how much CPU-Bound the scenario is. In the first slide, where the resolution is low and the primary bottleneck is the processor, going up the L3 sizes raises performance by ~10% while as on 1080p it raises performance by ~8%. This allows us to predict a trend. I would be willing to bet that this margin would be very low on 4K resolution and quite high on multi-gpu configurations. Up next we have AMD slides:
Once again we see a similar trend going up from "No L3" to "8MB L3". The scaling is pretty similar, with the only exception being that the scale here starts from No L3 instead of 2MB L3. It is worth pointing out at this stage that AMD steamroller architecture has a significant difference in cache layout. Where each Intel core has its own and private L1, two AMD cores in one Module share L1 cache between themselves. This accounts for why the scale is slightly different, relatively speaking, amongst different AMD CPUs. To those of you who are wondering, yes DG Lee accounted for the difference in processor cores, clock speeds, etc and mentions them in great detail in his original piece (which I would suggest to read if you can stomach the linguistic mess that is Google Translate).