Nvidia: The Geforce GTX 1080 Graphics Card Can Do Asynchronous Compute

Usman Pirzada
Posted May 8, 2016
313Shares
Share Tweet Submit

The Async Compute problem is probably one of the most controversial issues surrounding the older generation of Geforce graphics cards from Nvidia. Something very interesting, however, is present in the press release that they sent out to, well, the press. According to the official statement, the GTX 1080 is fully capable of performing Async Compute. If this turns out to be true, then this will give negate a major edge that Radeon graphics cards from AMD have enjoyed this past year.

Nvidia: GTX 1080 is capable of Async Compute

Keep in mind however, that even Maxwell featured Asynchronous Compute on paper. Unfortunately, due to the fact that expensive software based context switching had to be employed before it could be used, (since it did not have a dedicated hardware scheduler like AMD’s GCN) resulted in lowered performance on Maxwell based graphics cards. Nvidia’s style had been a technique called preemption, which it had perfected to an impressive degree. The reviews are going to be out on May 27th and if independent reviews confirm this fact, than it will be a huge win for the green camp. 

Asynchronous compute has been a deal sweetener for Radeon buyers ever since the DirectX 12 API hit the stage. AMD currently leads in Hitman and AOTS which utilize their Asynchronous shader technology developed around DirectX 12 API. Interestingly, Nvidia GPUs historically perform much better without ASync turned on. This is probably due to the fact that Nvidia had apparently disabled ASync from their driver suite. The rationale given for that move is that its GPUs cannot process ASync concurrently on the hardware level, rather they need context switching which is expensive in terms of frame rate.

The Async Compute Story Distilled Down To Its Core

Async Compute has been a hot subject of debate ever since gamers became aware of its very existence. We dove deep a couple of months ago into this peculiar DirectX 12 feature in our two thousand word analysis piece dubbed “AMD’s Secret DirectX 12 Weapon That Nvidia Had To Trade Off – Demystifying Async Compute“. We explained the inherent architectural differences between Nvidia and AMD graphics cards and distilled the key reasons as to why they deal and perform so differently with asynchronous game code. We’d highly recommend giving it a read if you’re looking to wrap your head around this topic and get down to the core of the issue before proceeding.

AMD Partners With Firaxis To Bring DX12, Full Async Compute & Explicit Multi-Adapter To Civilization VI

The following is the relevant extract from the press release:

Five Marvels of Pascal: NVIDIA engineered the Pascal architecture to handle the massive computing demands of technologies like VR. It incorporates five transformational technologies:

  • Next-Gen GPU Architecture. Pascal is optimized for performance per watt. The GTX 1080 is 3x more power efficient than the Maxwell Architecture.
  • 16nm FinFET Process. The GTX 1080 is the first gaming GPUs designed for the 16nm FinFET process, which uses smaller, faster transistors that can be packed together more densely. Its 7.2 billion transistors deliver a dramatic increase in performance and efficiency.
  • Advanced Memory. Pascal-based GPUs are the first to harness the power of 8GB of Micron’s GDDR5X memory. The 256-bit memory interface runs at 10Gb/sec., helping to drive 1.7x higher effective memory bandwidth than that delivered by regular GDDR5.
  • Superb Craftsmanship. Increases in bandwidth and power efficiency allow the GTX 1080 to run at clock speeds never before possible — over 1700 MHz — while consuming only 180 watts of power. New asynchronous compute advances improve efficiency and gaming performance.” And new GPU Boost™ 3 technology supports advanced overclocking functionality.
  • Groundbreaking Gaming Technology. NVIDIA is changing the face of gaming from development to play to sharing. New NVIDIA VRWorks™ software features let game developers bring unprecedented immersiveness to gaming environments. NVIDIA’s Ansel™ technology lets gamers share their gaming experiences and explore gaming worlds in new ways.

Async Compute on the GTX 1080 will allow developers to execute some tasks that would otherwise be allocated to the CPU, on the GPU. This means that if a game is being CPU-bound (that is to say the CPU is the bottlenec present), it will drastically increase frame rates. It may even improve performance in games that are GPU bound, by allowing full use of the GPUs resources. The thing we have to keep in mind however that Preemption and Asynchronous compute are both different approaches to achieve the same end result: maximizing the utilization of a GPU. And while AMD will have you believe Async is drastically superior choice, badly implemented Async will fare much worse than properly implemented preemption.

Rise Of The Tomb Raider DX12 ASync Compute AMD Performance Improved; NVIDIA Users Better Off Using DX11

Due to the pressure exerted by the industry to make Async compatible graphics cards however, Nvidia has been working actively to implement Async in their GPUs but were held back due to the fact that this was something that had to be implemented at a hardware level. Chip design usually takes a lot of time (in the lieu of many years) and if Nvidia has actually managed to properly implement Async in the Pascal based GTX 1080 – that would be quite an accomplishment. Something their CEO stated a few weeks back (regarding the P100 being capable of advanced preemption) made us think that Pascal might stick with the Preemption approach for now, but the press release from Nvidia states otherwise. So consider us pleasantly surprised!

Nvidia Geforce 'Pascal' GP100 Compute Specifications

GPUKepler GK110Maxwell GM200Pascal GP100
Compute Capability3.55.36.0
Threads / Warp323232
Max Warps / Multiprocessor646464
Max Threads / Multiprocessor204820482048
Max Thread Blocks / Multiprocessor163232
Max 32-bit Registers / SM655366553665536
Max Registers / Block655363276865536
Max Registers / Thread255255255
Max Thread Block Size102410241024
CUDA Cores / SM19212864
Shared Memory Size / SM Configurations (bytes)16K/32K/48K96K64K

Share Tweet Submit