NVIDIA’s 20nm Tegra X1 Super Chip Announced At CES 2015 – Features Maxwell Core Architecture With 256 CUDA Cores and 1TFLOPs Compute

Hassan Mujtaba
Posted Jan 5, 2015
84Shares
Share Tweet Submit

NVIDIA has officially announced and shown off their latest 20nm Tegra X1 Super Chip featuring the Maxwell core architecture to power next generation mobility devices. With a TDP of just 10W, the Tegra X1 mobility chip features over 2 times the performance of Apple’s A8X SOC which has been featured in the iPhone 6 and iPhone 6 Plus smartphones.

NVIDIA’s 20nm Tegra X1 Super Chip Announced At CES 2015 – Features Maxwell Core Architecture

The NVIDIA Tegra X1 SOC makes use of the 20nm ARM CPU architecture while the graphics side is powered by the ultra efficient Maxwell core. The Tegra X1 (formerly known as Tegra ERISTA) will feature eight 64-bit ARM CPU cores with a full fledge Maxwell GPU core that has 2 SMM units on the die enabled giving 256 CUDA Cores. The CPU utilizes the TSMC 20nm Planar process which has been used by Apple to power their latest Cyclone A8 SOC and is a valid solution for mobility solution which demand high efficiency and lower power.

The exact details on the CPU Cores are not know but it’s said to be based on a combination of four Cortex A-57 and four Cortex A-53 64/32-bit cores with the dual stacks integrated inside the die that deliver 1.0 TFlops of compute in 16-bit workloads (FP16) and around 500 GFlops for 32-bit workloads (FP32). NVIDIA has revealed that the chip is able to achieve the 1 TFLOP throughput at just 4W. The Tegra X1 has a 2 MB L2 cache and the A57 stack is coupled with 48KB L1 instruction cache and 32 KB L1 data cache. The A53 core stack being more power efficient feature 512 KB L2 cache. On the memory front, the SOC can use LPDDR3 or LPDDR4 with capacities of up to 4 GB.

Because of NVIDIA’s learning and experience with its 4-PLUS-1 CPU architecture first introduced on NVIDIA Tegra 3, and expertise in creating high power, efficient silicon layout designs, Tegra X1 delivers higher performance and power efficiency than other SoCs (System-on-a-Chip) that are based on the A57/A53 CPU implementation. Tegra X1 provides almost 2x the power efficiency for the same CPU performance. And for the same power consumed, Tegra X1 delivers almost 1.4x higher CPU performance. via NVIDIA

Compared to the 192 CUDA Cores on Kepler based Tegra K1, it should be noted that Maxwell cores feature 40%  better performance and 2 times the efficiency hence delivering increased speed in gaming and other GPGPU applications which will be suited for devices based on the Tegra X1 chip. The Maxwell architecture at a high level is similar to its predecessor, the Kepler GPU architecture in the sense that it is based on fundamental compute cores called CUDA cores, Streaming Multiprocessors (SMs), Polymorph Engines, Warp Schedulers, Texture Caches, and other hardware elements. But each hardware block on Maxwell has been optimized and upgraded with an intensive focus on power efficiency.

NVIDIA's Flagship Pascal GeForce GTX 1080M Mobility GPU Detailed - 3DMark Firestrike Score and Specifications Unveiled

Specifications wise, the 2 SMMs of Maxwell GPU result in a total of 256 CUDA Cores with 16 ROPs and 16 Texture units. The clock speed isn’t mentioned but the chip pumps out a good 16 GTexels/s fill rate. The Maxwell GPU has also been manufacutred on the 20nm process which will deliver improved energy efficency compared to desktop variants. Memory clock is maintained at 1.6 GHz pumping out 25.6 GB/s bandwidth and has a 256 KB L2 cache. The GPU comes with VXGI, Memory Compression and all the performance features introduced on GeForce 900 series cards a few months ago.

NVIDIA Tegra X1 Maxwell GPU Specifications:

GPU Tegra K1 (Kepler GPU) Tegra X1 (Maxwell GPU)
SMs 1 2
CUDA Cores 192 256
GFLOPs (FP32) 365 512
GFLOPs (FP16) 365 1024
ROPs/TMUs 4/8 16/16
Memory Clock 930 MHz 1.60 GHz
Memory Bandwidth 14.9 GB/s 25.6 GB/s
Manufacturing Process 28nm 20nm
L2 Cache Size 128 KB 256 KB

Tegra X1 has the exact same engine that runs in high-end PCs and a next-gen game console. TX1 is the first mobile chip to provide a teraflop of processing power. That’s equivalent to the fastest supercomputer in the world in 2000. But that system consumed a million watts. Now, we can put that in a tiny chip. via NVIDIA

Tegra X1 supports 4K H.265 (HEVC) and VP9 video streams at 60 fps. Other processors support 4K at 30fps, and deliver sub optimal experiences while viewing fast action sports, movies, and video games. Tegra X1 also supports decode of 10-bit color-depth 4K H.265 60 fps video streams. This enables Tegra X1 products to stream a wide selection 4K content from services such as Netflix. Tegra X1 supports 4K 60 fps local and external displays with support for HDMI 2.0 interfaces and HDCP 2.2 copy protection. On the encode side, Tegra X1 supports encode of 4K video at 30 fps in H.264, H.265 and VP8 formats.

When it comes to performance, NVIDIA posted slides which showed off the Tegra X1 chip pitted against Tegra K1 and A8X SOC in GFXBench, 3DMark IceStorm and Basemark. The Tegra X1 has performance off the roof and delivering around 2 – 1.5 times the improvement over its predecessor and competing cores from manufacturing companies. NVIDIA also demoed the device with a run of Unreal Engine 4 elemental and the device was able to run it without any hiccups with the same smoothness as displayed on Xbox One and PlayStation 4 a year ago under the advertised TDP headroom of 10W. The chip features support for OpenGL 4.5, DirectX 11.2 and the upcoming DirectX 12 APIs.

NVIDIA Drive CX and Drive PX With Dual Tegra X1  Launched

NVIDIA has also announced the NVIDIA Drive that is a digital cockpit computer allowing a 16.6 Million pixel resolution and driving multiple virtual machines. It comes  with its own NVIDIA Drive Studio and is regarded as the industry’s most advanced visual computing platform. The Drive PX on the other hand is also powered by Tegra X1 taking the performance to hit 2.3 TFlops and crunch 1.3 GPixels/s. The high performance throughput of Drive PX is achieved by integrating dual Tegra X1 SOCs on the PCB. Used for auto piloting cars, the Drive PX can detect surroundings based on computational data gathered from several cameras on the car. Following are a few words experts had to say for NVIDIA’s lartest soc:

  • “Tegra K1 set a new bar for GPU compute performance, and now just a year later Tegra X1 delivers twice that. This impressive technical achievement benefits both 3D graphics, particularly on devices with high-resolution screens, as well as GPGPU software that is becoming more prevalent, particularly in automotive applications.”
  • Linley Gwennap, founder and principal analyst of the Linley Group. “Tegra X1 has enough horsepower to beat the first teraflop supercomputer. Imagine what developers will soon be doing with it.”
  • Jon Peddie, president, Jon Peddie Research. “Tegra X1 raises the expectation for what a mobile chip is capable of. Its computing power is mind-blowing.”
  • Tim Bajarin, president, Creative Strategies, Inc. “Tegra X1 is impressive – a huge leap beyond Tegra K1, which was launched just a year ago. It will be a driving force in the automotive, tablet, embedded and mobile gaming markets in 2015.”
  • Pat Moorhead, founder, president and principal analyst, Moor Insights and Strategy. “NVIDIA’s ability to bring Maxwell to Tegra just months after launching GTX 980 is simply stunning.”
  • Rob Enderle, president and principal analyst, Enderle Group. “The Tegra X1 will give other processors a run for their money, even the Intel Core products, especially in embedded and graphics-intensive applications like computer vision for automotive applications.”
NVIDIA GeForce GTX 1080 Mobility GPU Pictured - First Notebook Chip With GDDR5X Memory, 20% Faster Than The Original Titan X

NVIDIA Tegra X1 Maxwell Block Diagram:

Tegra X1 Block Diagram

NVIDIA Tegra X1 Render

Share Tweet Submit