⋮    ⋮  

[Exclusive] Asynchronous Compute Investigated On Nvidia And AMD in Fable Legends DX12 Benchmark, Not Working on Maxwell

Oct 2, 2015

Delving into Async Compute

Async Shader Usage

Looking at the GPU we can see that there are four separate queues that are being used with Fable Legends running. We have the normal 3D render queue, which has tasks stacked, though they’re only run one at a time, not concurrently. Underneath we see two copy queues that are two instances of the SDMA (serial direct memory access) engine and one compute queue. The interesting thing is that only one compute queue is being utilized here, and within it only one thread. The _0 indicates that at least more than one can be recognized, though they aren’t technically being show, or even used here.

Async Compute

The presence of the compute queue is the key to identifying async compute usage. That’s not to say that it isn’t turned on within the game and the engine itself, because it is, but it merely indicates if async is actually being used by the driver and thus the GPU. In practical terms it just shows that some process somewhere on your system is making use of compute resources. If we see an operation down there that’s the same color as what’s in the 3D render queue (indicating they’re from the same program), then it means that compute is being used by the benchmark.

There are two streams happening, but of course only one can be executed. Also, the compute work is being put into the render stream in the next frame, after it’s been completed. What’s interesting is that there is actually very little happening in the compute queue. We’re looking at a total of 18.91%. This is in stark contrast to what Lionhead has told us. But keep in mind that while we aren’t seeing much in this pre-defined demo, that doesn’t mean that the game itself will act in the same way.

Moving on to NVIDIA’s Maxwell based hardware we’re going to use the same test-setup but instead use a Titan X to analyze the effects of async compute, or at least see if it’s able to be taken advantage of here.

Analyzing the GPUView graph we see one 3D render queue, two copy queues and one compute queue. Oddly, that compute queue is never made use of for the Fable Legends benchmark, but is there for another background task not associated to the benchmark at all.

Unfortunately, GPUView assigns the color to applications, and decided that a dark blue and black were good choices for different packets from the same program but for different purposes. Difficult to discern, but I digress, we can see two different shades here that are associated to the same program. The dark blue is likely representative of the majority of the render queue, though the black is something else entirely, or at least is started out as a request for another type of queue, but was placed in the render queue instead.

What we do find, however, is that the Titan X is likely allowing the benchmarks request for async compute to go through, but instead those workloads are placed directly into the 3D render queue. So Async is still on, and NVIDIA’s driver is aware if it, it’s just not scheduling it as would be proper. What might be happening is that some kind of other, still efficient method of dealing with those specific types of requests is being used instead.