Nvidia GTX 1070 Undressed, GP104 GPU Gets First Ever Die Shots – Dissecting The Heart Of GeForce

Khalid Moammer • Sep 22, 2016 at 11:31am EDT

A die shot of NVIDIA's Pascal GP104 GPU core.

Nvidia's GeForce GTX 1070 graphics card gets undressed and dissected. Its core the Pascal GP104 GPU exposed and beautifully photographed. These are the first ever public die shots of the 314mm². If you're a hardware enthusiast grab a drink, sit back, relax and prepare to drool.

Nvidia GP104 GPU Dissected - Cutting Through The Heart Of The GTX 1080 & 1070

Firs things first, all credit goes to Fritzchens Fritz who has gone to astounding lengths to get these amazing die shots. He has a wonderful collection of pristine quality die shots of a wide range of GPUs, the latest of which happens to be GP104. Nvidia's mid-sized Pascal GPU powering the GeForce GTX 1080 and its little brother the GTX 1070.

The couple of photos you see above are of a GTX 1070 that has been torn down. The GP104 ASIC was first removed from its socket and the die separated from its mini circuit board.

The building block of the Maxwell architecture has been stripped apart and redesigned to create Pascal. This building block which Nvidia dubs the Streaming Multiprocessor or SM for short is the engine that drives the graphics and compute horsepower of every Pascal chip.

With Maxwell, Pascal's predecessor powering the GTX 900 series, Nvidia introduced the Streaming Maxwell Multiprocessor. The SMM built on the strengths of Nvidia's Kepler SM - introduced with the GTX 600 and 700 series - which Nvidia dubs the SMX. It also done away with many unnecessary complexities which enabled the engine to deliver more throughput and higher clock speeds. The Pascal SM in its own right is an evolution of the Maxwell SM, a smarter, more streamlined engine.

Nvidia GTX 1080 - GP104 Block DIagram

Inside of a fully unlocked GP104 there are four Graphics Processing Clusters or GPCs for short. Each GPC consists of five Streaming Multiprocessors or SMs for short - each SM contains 128 CUDA cores - and sixteen Texture Mapping Units , AKA TMUs. Each GPC includes two render back-ends made up of eight Render Output Units, ROPs, each. In total this adds up to 2560 CUDA cores, 160 TMUs and 64 ROPs inside GP104. Finally the engine is connected via eight 32-bit GDDR5X memory segments - 256bit memory controller - to 8GB of GDDR5X memory.

Each GP104 streaming multiprocessor includes 128 FP32 CUDA cores, the same as Maxwell. Within each GP104 streaming multiprocessor there are four 32 CUDA core partitions, four dispatch units, two warp schedulers and a fairly large instruction buffer. Twice as large compared to Maxwell.

GP104, Bare

There it is, GP104 in all its glory. It looks nothing like the block diagram and for good reason. The GP104 diagram published by Nvidia is no more than a simplistic visual representation of the architecture. The physical implementation of that architecture however is far more complex. It's no surprise that designing a modern high performance graphics chip can cost well into the hundreds of millions.

The architecture itself, putting pen to paper, is not the hardest or most expensive part of developing a chip like GP104. The physical implementation itself, putting light to sand so to speak, is where things can get really challenging. The bigger the chip the bigger the challenge.

The physical layout of the GP104 GPU is grouped into four main GPC engine divisions. Each GPC taking roughly one quarter of the chip and housing 5 SMs. Feeding the beast are the eight 32bit GDDR5X memory segments that make-up ring around the periphery of the die.

Nvidia GP104 GPU die shot via Fritzchens Fritz with added annotations by Wccftech.com

The SM arrangement is almost identical to what we've seen with the much larger 3840 CUDA core GP100 GPU that powers the Tesla P100 accelerator. Only inside GP100 each SM contains exactly half the number of CUDA cores, dispatch units and warp schedulers vs GP104. But in turn there are twice as many SMs per GPC.

So the primary layout difference between GP104 and GP102 is that Nvidia is grouping 64 CUDA core SMs in pairs made up of 128 CUDA cores each and in turn naming the larger 128 unit an SM instead. This is all while maintaining the exact same ratio of dispatch units, warp schedulers and instruction buffers per CUDA core that we've seen with GP100.

The GP104 Pascal Streaming Multiprocessor

So think of it as Nvidia just pairing 64 CUDA core groups together in a single SM. This decision is likely influenced by the significant reduction of FP64, double precision, CUDA cores per SM inside GP104 vs GP100. GP104 only contains 1 FP64 CUDA core for every 32 FP32 CUDA cores. While GP100 has one FP64 CUDA core for every two FP32 CUDA cores, 16 times more than GP104.

Additionally, each GP104 SM has twice the number of registers as Maxwell. This in turn means that not only can Pascal accommodate more threads compared to Maxwell but each thread has access to more registers and thus a lot more throughput. Finally, each warp scheduler can dispatch two instructions per clock.

Nvidia's Senior Architect, Lars Nyland admitted that the 16nm FinFET process played an important role in realizing the team's power efficiency goals for Pascal, but maintains that numerous architectural improvements aided in further reducing the energy footprint of the architecture. Including the employment of new on-chip voltage signaling techniques.

The GTX 1080's bigger brother the GTX 1080 Ti is expected to launch some time next year with well over three thousand CUDA cores and GTX Titan X Pascal'esque performance.

Official Geforce GTX 1080 and Geforce GTX 1070 Specifications

WCCFTech	Nvidia Geforce GTX 1080	Nvidia Geforce GTX 1070
Architecture	Pascal	Pascal
Transistors	7.2 Billion	7.2 Billion
CUDA Cores	2560	1920
Core Clock	1607 Mhz	TBA
Boost Clock	1733 Mhz	1683 Mhz
Memory Type	G5X (GDDR5X)	GDDR5
Memory Speed	10 Gbps	8 Gbps
Memory Configuration	8GB	8GB
Bus Width	256-bit	256-bit
Memory Bandwidth	320 GB/s	256 GB/s
Multi Projection	Yes	Yes
HB SLI Bridge Support	Yes	Yes
Nvidia GPU Boost	3.0	3.0
DirectX 12 Feature Level	12_1	12_1
OpenGL	4.5	4.5
Vulkan API	Yes	Yes
Maximum Digital Resolution	7680x4320@60Hz	7680x4320@60Hz
Display Connectors	DP 1.42, HDMI 2.0b, DL-DVI	DP 1.42, HDMI 2.0b, DL-DVI
HDCP	2.2	2.2
Power Draw	180W	150W
Power Connector	Single 8-Pin	Single 8-Pin
Maximum Operating Temp	94 C	94 C
Partner Price (MSRP)	$599	$379
FE Price (MSRP)	$699	$449

About the author: PC hardware & tech evangelist. Been building PCs for over a decade & following the industry for just as long. Also a doctor specializing in Preventive Medicine.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on Nvidia GTX 1070 Undressed, GP104 GPU Gets First Ever Die Shots – Dissecting The Heart Of GeForce

Nvidia GTX 1070 Undressed, GP104 GPU Gets First Ever Die Shots – Dissecting The Heart Of GeForce

Nvidia GP104 GPU Dissected - Cutting Through The Heart Of The GTX 1080 & 1070

GP104, Bare

Official Geforce GTX 1080 and Geforce GTX 1070 Specifications

Trending Stories

Valve Says Red Line Of Death On Steam Machine Indicates Memory Training And Not GPU Failure; Confirms Flipped LED Bar On Steam Machine

AMD’s Next-Gen Medusa Point “10-Core” CPU Beats Strix “10-Core” By 29% In Single-Core & 22% In Multi-Core While Running At Just 2.0 GHz

Intel’s Arc Pro B70 Beats NVIDIA’s RTX 5090D In DeepSeek R1 AI LLM, Despite Costing A Quarter As Much, Offers Over 2000 Tokens/s

After Axing All Of Its Legacy Plans, T-Mobile’s Grubby Hands Are Now Coming After Your $800 Cellphone Subsidies

Obsidian Cancels Avowed 2 After Xbox Layoffs, Pivots to New Fallout Led by New Vegas Director

Popular Discussions

Intel’s Shot At Fabricating Apple’s A20 Chip For The Base iPhone 18 Collapses As A Credible Leaker Calls The Original Source A ‘Blowhard’

AMD’s Next-Gen Medusa Point “10-Core” CPU Beats Strix “10-Core” By 29% In Single-Core & 22% In Multi-Core While Running At Just 2.0 GHz

NVIDIA’s RTX 3060 12 GB Graphics Card Comeback Proves Just How Bad Things Are For The PC Gaming Market

AMD Ryzen Becomes The Top CPU Choice While Radeon Powers 1 In Every 3 Desktop Gaming GPUs Sold at Microcenter

Intel Expected To Restart Supply Of 10th, 12th, 13th, And 14th Gen Processors In Mainland China

Nvidia GTX 1070 Undressed, GP104 GPU Gets First Ever Die Shots – Dissecting The Heart Of GeForce

Nvidia GP104 GPU Dissected - Cutting Through The Heart Of The GTX 1080 & 1070

GP104, Bare

Official Geforce GTX 1080 and Geforce GTX 1070 Specifications

Further Reading

Trending Stories

Popular Discussions