AMD RYZEN ZEN 8 Core / 16 Thread CPU Benchmarked – On Par With Intel’s Core i7-6900K, Clocked at 3.4 GHz+ at 95W, Stellar IPC Gains

Hassan Mujtaba
Posted Dec 13, 2016
30Shares
Share Tweet Submit

AMD has officially kicked off their New Horizon event and showcased the performance of their next-generation RYZEN CPU based on the Zen core architecture.

AMD RYZEN ZEN Based CPU Benchmarked – On Par With Broadwell-E Core i7-6900K, Impressive IPC Gains

Now could be no better time for AMD to introduce a brand new processor series that covers the entire market. AMD has a solid framework laid with their RTG division and a great roadmap which will be offering faster GPUs to consumers and servers. AMD believes that the performance desktop market is the key sector of innovation for the PC platform.

AMD expects a rise in performance PCs with a 25% growth forecast from 2015 to 2018. The PC Gaming H/W market alone would account for over $30 billion and with the emergence of VR / AR, we can expect a lot of PC gamers gaining interest for faster hardware.

AMD is going to be using the RYZEN branding for their Zen based enthusiast processors. The underlying CPU architecture will be based on Zen and the platform itself will be known as Summit Ridge. The RYZEN chip that was showcased is a 8 core, 16 thread model that features a shared cache of 4 MB + 16 (8+8) MB (L2+ L3).

The chip is clocked at 3.4 GHz base clock. The chip features simultaneous multi threading and has an IPC gain that exceeds AMD’s goal of 40%. AMD stated that the chip features a 95W TDP which is quite an impressive feat for the specs featured on RYZEN. The 3.4 GHz AMD RYZEN chip is the base model and there will be even faster models with higher clock speeds.

The “New Horizon” fan event, hosted by gaming journalist and TV personality Geoff Keighley, showcased 8-core, 16-thread AMD Ryzen™ desktop processors running at 3.4 GHz in a number of never-before-seen, hands-on demos of extreme performance and all-new features for digital creators, VR pioneers, game world explorers, and tech thrill-seekers, including:

  • For the first time, the upcoming Vega GPU architecture was demonstrated live to fans, powered by Ryzen playing Star Wars© Battlefront™ – Rogue One at 4K resolution with smooth, high framerates.

  • Blender- and Handbrake-based image rendering and video transcoding demos showed that the new CPU can match or outperform the Intel Core i7 6900K — also an 8-core, 16-thread processor — in many complex creative tasks. The 140-watt TDP Core i7 6900K ran at stock processor speed and boost against a 95-watt TDP Ryzen processor at 3.4 GHz without boost, showing the computing power and performance-per-watt efficiency of Ryzen.

  • Again at 3.4 GHz, Ryzen was shown beating the game framerates of a Core i7 6900K playing Battlefield™ 1 at 4K resolution, with each CPU paired to an Nvidia Titan X GPU. via AMD

The new caches feature a brand new, clever prefetcher. Which recognizes mission critical data based by actually learning the data access patterns of a given application. This vital data is then tagged and prefetched for immediate use. It can also learn the location of future data accesses by analyzing the code of whatever program is running.

AMD RYZEN Blender Performance Test:

AMD showcased the first detailed performance preview of the RYZEN chip. The chip was tested against an i7-6900K which is $999 US Broadwell-E chip for the X99 platform. RYZEN and Core i7-6900K are similar in specs with 8 cores, 16 threads and a hand full of cache. The only difference was the clock speeds where the RYZEN chip was clocked at 3.4 GHz and Intel’s chip was clocked at stock speed of 3.7 GHz. AMD RYZEN managed to deliver the same amount of performance as the more costly Intel product with lower power input (95W) and also lower clock speeds.

Blender- and Handbrake-based image rendering and video transcoding demos showed that the new CPU can match or outperform the Intel Core i7 6900K in many complex creative tasks. The 140-watt TDP Core i7 6900K ran at stock processor speed and boost against a 95-watt TDP Ryzen processor at 3.4 GHz without boost, showing the computing power and performance-per-watt efficiency of Ryzen.

AMD RYZEN Blender Benchmark

 AMD RYZEN ES (3.4 GHz)Intel Core i7-6900K (3.2 GHz)
Blender Custom Scene35.57s36.01s

Higher Clock Speeds At Lower Voltages And Lower Power Consumption

Complementing “XFR” which we discussed earlier are two other interconnected features called Pure Power and Precision Boost.

AMD RYZEN ZEN CPU_XFR

Pure Power works by monitoring temperature, frequency and voltage readings in real time via embedded sensors distributed across the Zen cores. These sensors feed data back to what AMD calls the “Infinity Control Fabric” which then adjusts power to adapt perfectly to the situation. The goal is to use the least amount of voltage required to run any given structure. This results in lower average power consumption and cooler operation.

“The ‘Zen’ core at the heart of our Ryzen processors is the result of focused execution and thousands of engineering hours designing and delivering a next-level experience for high-end PC and workstation users,” said AMD President and CEO Dr. Lisa Su. “Ryzen processors with SenseMI technology represent the bold and determined spirit of innovation that drives everything we do at AMD.”

MSI AM4 X370 Motherboards For AMD Ryzen CPUs Showcased - X370 XPower Gaming Titanium and B350 Tomhawk With Fully Loaded Feature Set

AMD SenseMI technology is a key enabler of AMD’s landmark increase of greater than 40 percent in instructions per clock1, and is comprised of five components:

  • Pure Power – more than 100 embedded sensors with accuracy to the millivolt, milliwatt, and single degree level of temperature enable optimal voltage, clock frequency, and operating mode with minimal energy consumption;
  • Precision Boost – smart logic that monitors integrated sensors and optimizes clock speeds, in increments as small as 25MHz, at up to a thousand times a second;
  • Extended Frequency Range (XFR) – when the system senses added cooling capability, XFR raises the Precision Boost frequency to enhance performance;
  • Neural Net Prediction – an artificial intelligence neural network that learns to predict what future pathway an application will take based on past runs;
  • Smart Prefetch – sophisticated learning algorithms that track software behavior to anticipate the needs of an application and prepare the data in advance.

Precision Boost is the other face of the same coin. It works in tandem with Pure Power to maintain the highest possible frequencies at any given voltage. Which improves performance without contributing anything to the power dissipation of the chip. The feature is very prices and extremely responsive. Making changes in milliseconds and making adjustments in 25Mhz increments.

AMD X370 For Enthusiast AM4 Motherboards:

AMD X370 is the chipset for overclockers and tweakers who need robust platforms. This chip provides the ultimate low-level control to its users and delivers ultimate graphics card bandwidth. By bandwidth, AMD is referring to max PCI-Express lanes as this is the only chip in the stack that supports multi-GPU functionality. The chipset supports both, CFX (CrossFire) and SLI.

The board pictured is MSI’s upcoming X370 Tomahawk.

AMD has mentioned two full x16 (Gen3) lanes for GPUs. AIBs can add additional lanes through a PLX chip but that would add to the cost. X370 features full overclocking support with a very sophisticated GUI that will allow the best overclock tools and experiences.

Since all AM4 CPUs have an unlocked multiplier, record breakers will definitely put X370 boards to the test on liquid and LN2 setups. Other features on the AM4 X370 motherboards would be support for USB 3.1 Gen 2, NVMe and SATA Express storage/connectivity options.

Gigabyte GA-AX370 Gaming K3 Pre-Production Sample Spotted:

WccftechAMD NaplesAMD RYZEN
MarketEnterpriseDesktop
MicroarchitectureZenZen
Cores328
Threads6416
BaseTBA3.6Ghz (F3 Stepping)
TurboTBA3.9Ghz (F3 Stepping)
4.0Ghz (F4 Stepping)
L1 Instruction Cache32 KB x 3232 KB x 8
L1 Data Cache64 KB x 3264 KB x 8
L2 Cache512 KB x 32512 KB x 8
L3 Cache512 MB16 MB

AMD Zen Architecture Fully Detailed – Wider, High-Performance and Efficient Core Design

To start off with the details, Zen is based on the latest 14nm FinFET node. The only two foundries that have this node are Global Foundries and Samsung but we suspect AMD is using the former to develop Zen chips. The Zen core is said to feature 40% more instructions per clock compared to Excavator core.

AMD’s full Zen Hot Chips presentation reveals complete architecture details. (Image Credits: Golem.de)

Excavator core is featured on AMD’s Carrizo and Godavari processors. The large jump in IPC would help AMD achieve performance parity with Intel chips. In fact, AMD already demoed a 8 core Summit Ridge CPU based on Zen against a Broadwell-E 8 core chip. The demo showed AMD’s solution having better rendering performance than Intel’s HEDT solution.

AMD Zen Core Design and Core Engine

The basic building block of Zen is the core complex. The core complex comprises of four cores connected to an L3 cache. The L3 cache is 16-Way associated and makes up a total of 8 MB (mostly exclusive of L2 cache). The L3 cache is sliced into four, each comprising of two 1 MB L3 sub-slices. All cores can access these cache blocks with the same average latency speed.

The cores themselves feature two threads each. The core complex hence comprises of 8 threads while the 8 core SKUs will comprise of 16 threads. On each core, branch misdirect is improved and the branch prediction has been improved with two branches per BTB. The large Op cache helps improve throughput and latency at the same time. The integer cluster in each Zen core has six pipes, four ALUs, Arithmetic Logic Units, and two AGUs which is short for Address Generation Units.

These AGUs can perform two 16-byte loads and oine 16-byte store per cycle via a 32 KB 8-way set associative write-back L1 data cache. According to AMD the move from a write-through to a write-back cache has noticeably reduced stalls in several types of code paths. The load/store cache operations cache in Zen also reportedly exhibit lower latency compared to Excavator.

AMD has tried to improve Zen with a larger dispatch of 6 vs 4 on Excavator. Instruction schedulers for integer and floating point have also increased to 84 and 96, respectively. The FPU is now an Quad Issue while queue sizes for retire, load and store have increased to 192, 72, 44 compared to 128, 44, 32 on Excavator.

Holiday 2016 Graphics Card Buyers Guide

The two floating point units on the new core consist of 4 pipes with 128 FMACs per FPU. There are two FADD and two FMUL units for calculations on the FPU. The FPU consists of a 2-level scheduling queue with a 160 entry register file, 8-Wide retire and a single pipe for 128b store.It has its own two AES units and is SSE, AVX1,  AVX2, AES, SHA and legacy MMX compliant.

AMD Zen With SMT (Simultaneous Multi-Threading Support)

One of the most anticipated arrival on the new core is SMT support. This brings the design level much closer to Intel’s implementation. The SMT design offers increased throughput by executing two threads simultaneously. These virtual threads will appear as independent cores to software and allow more execution resources at the hand applications.

Along with the SMT support, Zen also features support for several new instructions. These include ADX, RDSEED, SMAP, SHA1, XSAVEC, CLZERO and PTE Coalescing. AMD also supports all the standard ISA that are mentioned above.

AMD Zen High Bandwidth, Low Latency Cache System

AMD has been talking about a disruptive cache system on their new core for a while. With the details finally out, we can now better understand this system. The cache hierarchy is made up of a fast private L2 cache on each core (512 KB L2 L+D 8-Way) and a fast shared L3 cache (8 MB L3 L+D 16-Way).

This enables faster band width for prefetch improvements allowing faster cache-to-cache transfers. The L3 cache is mostly filled up of the L2 victims while offering larger queues for L1 and L2 misses.

Each core also comprises of an 64K L1 L (4-Way) cache and 32K L1 D (8-Way) cache. The entire systems adds up to faster L1, L2 and L3 caches that offer faster load to FPU (7 cycles required). Bandwidth is improved to almost 2x on L1 and L2 while L3 cache system bandwidth is improved by 5x.

AMD Zen – A 14nm FinFET, Low Power and Faster Design

AMD Zen_14nm FinFET

Performance is one thing but one place where AMD has really lacked is efficiency. With Zen, that is going to change. Zen has much higher efficiency than Excavator which is a highly tuned design in itself. This is achieved through the use of aggressive clock-gating techniques on multi-level regions inside the core block. Some of the features that help achieve lower power on Zen include:

AMD Zen Low Power Features:

  • Aggressive Clock Gating with multi-level regions
  • Write Back L1 Cache
  • Large OP Cache
  • Stack Engine
  • Move Elimination
  • Power Focus from Project Inception
  • Low Power design Methodologies
CPU MicroarchitectureAMD Phenom II / K10AMD BD/PDAMD SR/XVAMD ZenIntel Skylake
Instruction Decode Width3-wide4-wide8-wide4-wide4-wide
Single Core Peak Decode Rate3 instructions4 instructions8 instructions4 instructions4 instructions
Dual Core Peak Decode Rate6 instructions4 instructions8 instructions8 instructions8 instructions

AMD Zen To Arrive as RYZEN on Desktop in Q1, Naples on Server in Q2 and Raven Ridge on Notebook Platforms in 2H 2017

The desktop lineup based on the Zen architecture will be known as Summit Ridge and is expected to arrive in Q1 2017. Launching in limited quantities, AMD will ramp up availability for Summit Ridge lineup in Q2 2017 when it will be available to a wide range of audience who are planning to build gaming and enthusiast grade PCs.

The Summit Ridge platform is very impressive as it rids AMD of their existing and old AM3+ and FM2+ platform. The AM4 platform which will support the new chips comes with a slew of new features and capabilities such as support for the latest DDR4 memory, PCI-e Gen 3.0 and next-gen I/O support. AMD will have several SKUs in the works, ranging from quad core to octa core models. All multi-threaded and featuring overclock support.

AMD will also be launching Zen inside server based chips known as “Naples“. These chips would range from 16 to 32 core models and were showcased in several racks at the AMD Tech Summit 2016. Naples would be made available in Q2 2017 to the server market. Finally, we have Raven Ridge which is the codename for notebook line of processors. These will be made available in 2H 2017 to mobility users and we can expect an update on the graphics side as well.

AMD Infinity Fabric for Summit Ridge, Raven Ridge and Vega. (Image Credits: Computerbase)

Next Generation AMD CPUs And APUs

WCCFTechAMD Raven RidgeAMD Gray HawkAMD Summit RidgeAMD Bristol Ridge
Product ArchitectureZenZen+ZenExcavator
Process Node14nm7nm14nm28nm
CPU CoresUp to 4Up to 4Up to 8Up to 4
GPU ArchitectureVegaNaviN/ACaribbean Islands
TDPTBATBA65W-95W35-65W
SocketAM4AM4+AM4AM4
Memory SupportDDR4 & HBMDDR4 & HBMDDR4DDR4
Launch2H 20172019Q1 2017October 2016

AMD has revealed the details for their next-generation RYZEN CPUs but there’s still a lot more to know. We saw the first performance demos and it’s finally great to hear that AMD has chips competing with Intel’s top enthusiast class Broadwell-E processors. RYZEN is planned for launch in Q1 2017, around February – March. Fans should expect more details on pricing and SKUs based on Zen architecture at CES 2017 which is just a month away.

Share Tweet Submit