[Exclusive] Asynchronous Compute Investigated On Nvidia And AMD in Fable Legends DX12 Benchmark, Not Working on Maxwell
Disclaimer: The GPU's we use in performance benchmarks are the ones we personally have. We can't afford everything, nor do manufacturers give us much. The Fury X is likewise not really available in the US readily, so until we receive a sample from the magical unicorn fairies, these will have to do.
A big hearty thanks to Shaun Walsh for helping with GPUView and explaining some deep programatic concepts.
AMD has apparently been able to offload up to 30% of a workload to async compute, making the 18% seen here as a rather pleasant and good example of async compute usage.
There seems to have been quite the murmur regarding the performance of the Fable Legends benchmark. Certainly the graphics latency tests reveal some interesting information as reported by the benchmark, though they perhaps don't tell the entire story about performance.
Async Compute is enabled, just not fully being utilized to the greatest extent possible.
It's important to keep in mind that the Fable Legends benchmark is a test using software that's undeniably in the beta phase. It's not representative of final performance, and due to the closed nature, it's not even really indicative of how the actual game will play.
As we know, NVIDIA currently doesn’t support Asynchronous Compute fully, or at least the current driver implementation isn’t able to schedule these tasks correctly. Thus it’s been argued that GPU’s with async compute support could, or even should, have a larger advantage.
Microsoft and Lionhead Studios have assured us that asynchronous compute is indeed activated and on for the test, across the board, and that it doesn't turn off based on the presence of a card that doesn't support it. They've also given us a statement on just how of the compute pipeline is used. Dynamic lighting and even using async compute for instanced foliage. In essence, they've told us that compute is being used in rather healthy doses. We'll see just how much of the compute queue is actually used as opposed to what Lionhead says is being used.
For cards that might not support it, however, those tasks are simply put into a the normal render queue instead of being put into a compute queue. How does this effect performance? Unfortunately it doesn’t quite scale linearly, and as we’ll see shortly, async compute is indeed working, though even on AMD, very little of the workload is being offloaded to the available ACE’s, and only one ACE is actually being used at that.
Initially we wanted to see if we could simply turn off async ourselves within the benchmark to see if there’s any appreciable difference in performance. Unfortunately, the settings have been hardcoded into the benchmark, likely to keep things even across the board for a less controversial test. So then we resorted to using the trusty GPUView, which is a tool by a former Microsoft intern. First we capture log data from Microsoft’s Event Tracing framework and analyze it within GPUView.
For the test we'll be using the the below configuration. We'll also explore CPU usage on this i5-6600K as well as an i7-5960X with the Nano as the driving GPU for consistencies sake. The Nano will be compared against a Titan X for the GPU tests. All the tests will be run at 1080P to limit GPU bottlenecks.
|CPU||Intel Core i5-6600K|
|Motherboard||ASRock Z170 Extreme 4|
|Power Supply||EVGA SuperNOVA 1300 G2|
|HDD||SanDisk Extreme II 120GB|
|Storage Disk||Seagate 2TB|
|Memory||16GB Crucial Ballistix DDR4 2400|
|Video Cards||AMD R9 Fury, AMD R9 Fury Nano, GeForce GTX Titan X|
|Operating System||Window 10 64-Bit|
First up we'll take a look into how AMD handles the Fable Legends benchmark. Then we'll delve into NVIDIA's take, and finally we'll look at CPU utilization.