AMD Ryzen Architectural Deep-Dive – Ending The Intel Monopoly
We’re only two weeks away from Ryzen’s half a decade long awaited arrival. This is the AMD CPU that PC hardware enthusiasts have eagerly looked forward to for so long and what a CPU it is. We’re going to be taking you through a deep-dive on the company’s brand new high performance Zen CPU microarchitecture, its features, specs and its performance.
Ryzen, AMD’s Most Important Product In More Than A Decade
Many Years In The making
The journey of the Zen microarchitecture, which sits at the core of every Ryzen chip, has been a long and challenging one. It’s the company’s first attempt to compete at the high-end, enthusiast, CPU market since the introduction of the Bulldozer microarchitecture five years ago. Zen breaks new ground for AMD in many ways. It’s the company’s first ever CPU architecture to feature simultaneous multithreading. It’s also the very first CPU for AMD to be built on a process technology that’s very close to parity with Intel since the days of the original Athlon more than a decade ago.
It means that for the very first time since the early 2000s AMD’s CPUs are no longer at an inherent disadvantage due to Intel’s process lead. From an architectural point of view Zen is a brand new clean-slate design that’s been led from the get-go by accomplished CPU architect Jim Keller. The very same engineer that played a pivotal role in designing the original Athlon XP and Athlon64 processors, the most successful and competitive CPU products in the history of the company.
Zen is AMD’s biggest long-term technology bet and one of the largest engineering efforts undertaken by the company. Design work on the microarchitecture began in 2012 and was completed four years later. The very first products based on the brand new CPU core design are Ryzen processors. Which are set to launch at the end of the month. However, we know that AMD is working on far more than just high performance desktop CPUs. The company has a 32 core Zen server CPU, a sixteen core Zen HPC APU and a quadcore Zen consumer APU called Raven Ridge. All of these products have been in the works since the very beginning.
The Zen Microarchitecture
Below we have a visual representation of an actual Zen core on silicon. The core is comprised of one floating point unit and one integer engine. This is a huge step away from the Bulldozer design, which featured two integer engines and one floating point unit per core. Each integer cluster in each Zen core has six pipes, four ALUs (Arithmetic Logic Units) and two AGUs which is short for Address Generation Units.
These AGUs can perform two 16-byte loads and one 16-byte store per cycle via a 32 KB 8-way set associative write-back L1 data cache. According to AMD the move from a write-through to a write-back cache has noticeably reduced stalls in several types of code paths. The load/store cache operations cache in Zen also reportedly exhibit lower latency compared to the 4th generation Bulldozer core Excavator.
Bulldozer’s relatively power hungry and slow cache hierarchies were one of the key factors in its poor single threaded performance and power efficiency. A lot of work has gone into designing a new cache sub-system for Zen to minimize the power and area footprints as well as make it as fast as the silicon will allow.
The L2 and L3 caches were grouped in a very clever way to minimize the access times by any given core at any given time. The write-through cache architecture has also been forgone in favor of a more power and area efficient write-back cache.
Another key area Zen differentiates itself from the Bulldozer family of cores is through its access to a relative abundance of L3 cache. Each Zen core has access to twice the capacity of L3 cache compared to AMD’s last 8-core chip code named “Orochi”. The infamous chip that we’ve come to know as the FX 8150 and later as the FX 8350 and their derivatives.
The floating point unit is capable of performing two FMAC operations or a single 256-bit AVX operation per cycle. Exactly as we had detailed in our exclusive architectural deep-dive last year funnily enough.
AMD’s First Microarchitecture To Feature Simulataneous Multithreading
The company has done away with the CMT – clustered multithreading – concept that was introduced with the Bulldozer family of cores in 2011 in favor of a more traditional SMT – simultaneous multithreading – design. This means that each Zen core will be able to execute two threads simultaneously. A primary very high throughput thread and a secondary thread with less oomph that can be used opportunistically.
In contrast, each Bulldozer module can execute two identical threads. This is achieved through two separate integer clusters with a single front-end. This approach saves area versus building two separate cores and delivers two high throughput threads. However, there are advantages that Zen’s SMT implementation holds over the Bulldozer CMT implementation. For one it allows AMD to build a single larger integer cluster with significantly higher single threaded performance. Another advantage with this approach is that it leaves a lot of wiggle room for clever savings in area and power.
Incredible Drive For Power Efficiency
AMD’s 8-core Ryzen chip has an army of sensors buzzing away to monitor voltages, temperatures, frequency and overall power at any given moment. These sensors are part of what AMD dubs its SenseMI family of technologies. We’ll talk about these technologies in much more detail further down. It’s these little engines that bring amazingly cool technologies such as the auto-overclocking XFR feature from the realm of science fiction to reality.
|CPU Microarchitecture||AMD Phenom II / K10||AMD BD/PD||AMD SR/XV||AMD Zen||Intel Skylake|
|Instruction Decode Width||3-wide||4-wide||8-wide||4-wide||4-wide|
|Single Core Peak Decode Rate||3 instructions||4 instructions||8 instructions||4 instructions||4 instructions|
|Dual Core Peak Decode Rate||6 instructions||4 instructions||8 instructions||8 instructions||8 instructions|
A lot of the engineering effort around Zen has also been done to address one of Bulldozer’s major flaws. Bulldozer and Intel’s Sandy Bridge – and subsqeuent Intel architectures including Skylake – had equally deep pipelines to achieve high clock speeds. The deeper the pipeline the more latency that the design will exhibit. Particularly when it comes to branch misprediction errors, which are quite common in such pipelines.
On the front-end each Zen core is capable of decoding four instructions per cycle, which are fed to the operations queue. The micro-op cache along with the queue have a throughput of six operations per cycle going into the schedulers.
The latency that results from branch mispredicts are quite significant. To combat this issue Intel introduced a micro-op cache with Sandy Bridge. It worked to a great extent in reducing mispredict penalties and was believed to be the principle reason behind Intel’s significant single threaded performance advantage over Bulldozer. AMD has finally introduced its own micro-op cache with Zen.
The Zen Microarchitecture In A Nutshell
The Zen core features a significantly wider execution engine than anything we’ve seen before from AMD before. Leveraging simultaneous multithreading and a micro-op queue to boost throughput and single-threaded performance. This combined with a brand new, low latency cache sub-system and a new set of pre-fetch algorithms result in a dramatic instruction per clock improvement and doubling of throughput per core compared to AMD’s previous 8 Piledriver FX 8300 series CPUs.
High Level Overview:
- Two threads per core
- 8 MB shared L3 cache
- Large, unified L2 cache
- Micro-op Cache
- Two AES units for security
- 14nm FinFET Transistors
Ryzen On The Desktop
|AMD Ryzen CPU||Cores/Threads||L3||TDP||Base||Turbo||XFR||Price|
|AMD Ryzen 7 1800X||8/16||16MB||95W||3.6GHz||4.0GHz||4.0GHz+||$489|
|AMD Ryzen 7 1800 Pro||8/16||16MB||65W||TBA||TBA||N/A||TBA|
|AMD Ryzen 7 1700X||8/16||16MB||95W||3.4GHz||3.8GHz||3.8GHz+||$389|
|AMD Ryzen 7 1700||8/16||16MB||65W||3.0GHz||3.7GHz||N/A||$319|
|AMD Ryzen 5 1600X||6/12||16MB||95W||3.3GHz||3.7GHz||3.7GHz+||$259|
|AMD Ryzen 5 1600||6/12||16MB||65W||TBA||TBA||N/A||TBA|
|AMD Ryzen 5 1500||6/12||16MB||65W||3.2GHz||3.5GHz||N/A||$229|
|AMD Ryzen 5 1400X||4/8||8MB||65W||3.5GHz||3.9GHz||3.9GHz+||$199|
|AMD Ryzen 5 1400||4/8||8MB||65W||TBA||TBA||N/A||TBA|
|AMD Ryzen 5 1300||4/8||8MB||65W||3.2GHz||3.5GHz||N/A||$175|
|AMD Ryzen 3 1200X||4/4||8MB||65W||TBA||3.4GHz||3.8GHz||$149|
|AMD Ryzen 3 1200||4/4||8MB||65W||TBA||TBA||N/A||TBA|
|AMD Ryzen 3 1100||4/4||8MB||65W||3.2GHz||3.5GHz||N/A||$129|
New Ryzen CPU Coolers With Customizable RGB Lighting
AMD Ryzen AM4 Motherboards & Chipsets
|PCIe 3 Lanes||24||24||TBA||TBA|
|PCIe 2 Lanes||8||6||TBA||TBA|
|USB 3.1 Gen2||2||2||TBA||TBA|
|USB 3.1 Gen1||8||4||TBA||TBA|
|Form Factor||ATX||ATX, M-ATX||M-ATX, Mini-ITX||Mini-ITX|
Overclocking A Ryzen CPU
AMD made a big fuss at CES about how all Ryzen CPUs will have unlocked frequency multipliers to facilitate easy overclocking. This means that users will be able to overclock any one of the 17 SKUs that we’ve seen to date by simply pairing their CPU with a mid-range B350 or high-end X370 AM4 motherboard and raising the frequency multiplier inside the motherboard’s UEFI/BIOS interface. In comparison, Intel has just three CPUs in its entire Kaby Lake lineup that are unlocked for overclocking.
Another interesting bit that we have reported on in one of our Ryzen exclusives a while back is Ryzen’s amazing XFR feature, short for Extended Frequency Range. Which allows every Ryzen CPU to automatically overclock itself and exceed its default boost clock speed whenever the thermal environment allows. This means that if you invest in a cooling solution that’s better than AMD’s Wraith, your Ryzen chip will automatically operate at higher clock speeds than what’s written on the box, rewarding you with more performance.
XFR in combination with two other features, Pure Power and Precision Boost, work in tandem to ensure that all active cores are running at the highest clock speeds they are capable of without exceeding the default power and thermal limits. Upgrading to a higher end cooling unit gives you more thermal headroom and adjusting the maximum TDP limit in your motherboard’s UEFI/BIOS settings has the same effect on the power headroom.
When it’s all said and done you obviously have the option to overclock the good old fashioned way. By upping the frequency multiplier and voltage until you exhaust the thermal headroom of your cooling. What XFR, Pure Play and Precision Boost do is make sure that your chip gives you its best right out of the box, without the need for user intervention. Whether you choose to push for more is entirely up to you.
Leaked Ryzen Benchmarks
Bringing this to a close, it’s clear that AMD has done a lot of things right with Zen. Pushing IPC and power efficiency to where they need to be. Building a comprehensive modern platform and bringing much needed updates to the featureset. Creating an attractive value proposition for desktop users, servers and notebooks. All the ingredients to make Zen a success are here.
The mere prospect that enthusiasts may actually have AMD CPUs as a worthwhile option again for the first time in a decade come March 2nd is refreshing. We’re mere weeks away from knowing whether we’ll finally be able to say AMD’s back!
Reviews for AMD’s upcoming Ryzen CPUs and accompanying motherboards are expected to go live on the 28th of February. The new products are expected to go on sale two days later on the 2nd of March.