NVIDIA’s 64-Bit Denver CPU Architecture Details Unveiled – Dual Custom ARMv8 Cores Clocked at 2.50 GHz

Hassan Mujtaba • Aug 12, 2014 at 07:11pm EDT

NVIDIA has unveiled the first architecture details of their custom designed 64-Bit Denver CPU which is also their first high-performance SOC design at Hot Chips. It has been almost eight months since NVIDIA launched their new Tegra K1 SOC which features an A15 processor and 192 Kepler cores featuring unparalleled amount of performance and power efficiency against chips from competitors.

NVIDIA's 64-Bit Denver CPU Architecture Details Unveiled

The first Tegra K1 variant which is based off the 32-Bit ARM15 core has made some name and featured in some hot selling devices such as the Xiaomi MiPad and the NVIDIA Shield Tablet which is the company's reference and latest Shield branded "handheld" gaming device. However, we have known since launch that there were always supposed to be two variants of the Tegra K1 SOC, one with the 32-Bit ARM core while the other featuring 64-Bit Denver CPU. Theoretically, Project Denver’s dual core should be much more powerful than the previous 4+1 Cortex A15 based variant. The ‘Super Dual Core’ as Nvidia calls it is a highly efficient architecture (ARMv8 -A) and the first iteration of ARM to feature 64 bit. A major indicator of its power efficiency is that while the 4+1 Variant features a low power core for non-intensive applications, the Denver Variant only has the 2 cores.

Denver is a dual core at its heart featuring a 7-Way Superscalar micorarchitecture fitted across 192 Kepler GPU cores. It includes a 128 KB 4-Way L1 cache, a 64 KB 4_Way L1 cache and a 2 MB 16-Way L2 cache. Denver also makes use of the new Dynamic code optimization which stores frequently used software routines into a dense and highly tuned microcode-equivalent routines. For this purpose, a 128MB main memory based optimization cache has been configured which reduces the need to re-optimize software routines

As part of the Dynamic Code Optimization process, Denver looks across a window of hundreds of instructions and unrolls loops, renames registers, removes unused instructions, and reorders the code in various ways for optimal speed. This effectively doubles the performance of the base-level hardware through the conversion of ARM code to highly optimized microcode routines and increases the execution energy efficiency. NVIDIA

So coming to the technical details, the details presented at Hot Chips show that Denver CPU has its own instruction set and make use of conversion to process ARMv8 instructions to its own ISA. As reported by TechReport:

Binary translation is for real. Yes, the Denver CPU runs its own native instruction set internally and converts ARMv8 instructions into its own internal ISA on the fly. The rationale behind doing so is the opportunity for dynamic code optimization. Denver can analyze ARM code just before execution and look for places where it can bundle together multiple instructions (that don't depend on one another) for execution in parallel. Binary translation has been used by some interesting CPU architectures in the past, including, famously, Transmeta's x86-compatible effort. It's also used for emulation of non-native code in a number of applications.Denver's binary translation layer runs in software, at a lower level than the operating system, and stores commonly accessed, already optimized code sequences in a 128MB cache stored in main memory. Optimized code sequences can then be recalled and replayed when they are used again.
Execution is wide but in-order. Denver attempts to save power and reap the benefits of dynamic code optimization by eschewing power-hungry out-of-order execution hardware in favor of a simpler in-order engine. That execution engine is very wide: seven-way superscalar and thus capable of processing as many as seven operations per clock cycle. Denver's peak instruction throughput should be very high. The tougher question is what its typical throughput will be in end-user workloads, which can be variable enough and contain enough dependencies to challenge dynamic optimization routines. In other words, Denver's high peak throughput could be accompanied by some fragility when it encounters difficult instruction sequences. via TechReport

The performance numbers were also presented for the Denver CPU in which its pitted against a Haswell "Celeron 2955", iPhone 5s (A7 Cyclone), Krait-400 (8974-AA) and Baytrail (Celeron N2910) processor. In all benchmarks, the Tegra K1 64-Bit Denver powered SOC turns out faster than the mobility based chips while the 15W Haswell CPU which does have a leverage in some benchmarks is running just on par with the Tegra K1 SOC. The wattage of Tegra K1 Denver is not known but would be lower than what we have seen on the 32-Bit variant but seeing how it performs equivalent to PC level chips is amazing. NVIDIA has stated that their Dual Core Denver CPU can surpass quad and Octa core mobile processors on most mobility workloads while delivering insane power efficiency. The Tegra K1 64-Bit aims to deliver PC-Class performance in the mobile word and NVIDIA assures that they will have mobile devices based on the Denver CPU arriving later this year and they are already developing the next version of Android "L" on Tegra K1.

NVIDIA Tegra K1 64-Bit Denver CPU Specifications:

	NVIDIA Tegra K1 64-Bit	NVIDIA Tegra K1 32-Bit	NVIDIA Tegra 4	NVIDIA Tegra 3
Codename	Logan	Logan	Wayne	Kal-El
ARM Cores	2 Core (Multi-Thread)	4+1	4+1	4 Core
ARM Architecture	64-bit ARM v8 (Custom)	32-bit Cortex A15	32-bit Cortex A15	32-bit Cortex A9
GPU Architecture	Kepler	Kepler	GeForce GPU	GeForce GPU
GPU Cores	192 Core	192 Core	72 Core	12 Core
Process	28nm	28nm	28nm HPL	40nm LPG
Core Frequency	2.5 GHz	2.3 GHz	1.9 GHz	1.2 GHz
Memory Size	8 GB	8 GB	4 GB	2 GB
Memory Type	DDR3L / LPDDR3	DDR3L / LPDDR3	DDR3L / LPDDR3	DDR3 / LPDDR2
Cache	128 K + 128 K L1	32K + 32K L1	32K + 32K L1	-
Launch	2014	2014	2013	2012

The Performance numbers have been compiled by the fellow forum members over at Beyond3D for better understanding:

DMIPS

Baytrail (Celeron N2910): 0.45x
S800 (Krait 400 8974AA): 0.95x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): 1.30x
Haswell (Celeron 2955U): 1.00x
Tegra K1 (Denver): 1.80x

SPECInt 2K

Baytrail (Celeron N2910): 0.70x
S800 (Krait 400 8974AA): 0.60x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): 0.90x
Haswell (Celeron 2955U): 1.30x
Tegra K1 (Denver): 1.45x

SPECFP 2K

Baytrail (Celeron N2910): 0.85x
S800 (Krait 400 8974AA): 0.80x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): N/A
Haswell (Celeron 2955U): 1.95x
Tegra K1 (Denver): 1.75x

AnTuTu 4

Baytrail (Celeron N2910): N/A
S800 (Krait 400 8974AA): 0.80x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): 0.70x
Haswell (Celeron 2955U): N/A
Tegra K1 (Denver): 1.00x

Geekbench 3 Single-Core

Baytrail (Celeron N2910): 0.65x
S800 (Krait 400 8974AA): 0.80x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): 1.20x
Haswell (Celeron 2955U): 1.20x
Tegra K1 (Denver): 1.65x

Google Octane v2.0

Baytrail (Celeron N2910): 0.70x
S800 (Krait 400 8974AA): 0.65x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): 0.70x
Haswell (Celeron 2955U): 1.45x
Tegra K1 (Denver): 1.30x

16MB Memcpy (GB/s)

Baytrail (Celeron N2910): 0.85x
S800 (Krait 400 8974AA): 0.80x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): 1.15x
Haswell (Celeron 2955U): 1.55x
Tegra K1 (Denver): 1.40x

16MB Memset (GB/s)

Baytrail (Celeron N2910): 0.40x
S800 (Krait 400 8974AA): 0.75x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): 0.80x
Haswell (Celeron 2955U): 0.65x
Tegra K1 (Denver): 1.05x

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on NVIDIA’s 64-Bit Denver CPU Architecture Details Unveiled – Dual Custom ARMv8 Cores Clocked at 2.50 GHz

NVIDIA’s 64-Bit Denver CPU Architecture Details Unveiled – Dual Custom ARMv8 Cores Clocked at 2.50 GHz

NVIDIA's 64-Bit Denver CPU Architecture Details Unveiled

NVIDIA Tegra K1 64-Bit Denver CPU Specifications:

Trending Stories

Valve Says Red Line Of Death On Steam Machine Indicates Memory Training And Not GPU Failure; Confirms Flipped LED Bar On Steam Machine

AMD’s Next-Gen Medusa Point “10-Core” CPU Beats Strix “10-Core” By 29% In Single-Core & 22% In Multi-Core While Running At Just 2.0 GHz

Intel’s Arc Pro B70 Beats NVIDIA’s RTX 5090D In DeepSeek R1 AI LLM, Despite Costing A Quarter As Much, Offers Over 2000 Tokens/s

Fallout 3 Remastered Lives On Despite Bethesda’s Silence, As Gameplay Footage is Reportedly Circulating

After Axing All Of Its Legacy Plans, T-Mobile’s Grubby Hands Are Now Coming After Your $800 Cellphone Subsidies

Popular Discussions

Intel’s Shot At Fabricating Apple’s A20 Chip For The Base iPhone 18 Collapses As A Credible Leaker Calls The Original Source A ‘Blowhard’

AMD’s Next-Gen Medusa Point “10-Core” CPU Beats Strix “10-Core” By 29% In Single-Core & 22% In Multi-Core While Running At Just 2.0 GHz

NVIDIA’s RTX 3060 12 GB Graphics Card Comeback Proves Just How Bad Things Are For The PC Gaming Market

AMD Ryzen Becomes The Top CPU Choice While Radeon Powers 1 In Every 3 Desktop Gaming GPUs Sold at Microcenter

Intel Expected To Restart Supply Of 10th, 12th, 13th, And 14th Gen Processors In Mainland China

NVIDIA’s 64-Bit Denver CPU Architecture Details Unveiled – Dual Custom ARMv8 Cores Clocked at 2.50 GHz

NVIDIA's 64-Bit Denver CPU Architecture Details Unveiled

NVIDIA Tegra K1 64-Bit Denver CPU Specifications:

Further Reading

Trending Stories

Popular Discussions