[Exclusive] Asynchronous Compute Investigated On Nvidia And AMD in Fable Legends DX12 Benchmark, Not Working on Maxwell
What Does This All Mean?
It's quite possible that the underlying game itself was optimized with the Xbox One in mind, which has significantly less async compute units than current high-end PC hardware. Thus only a few in-game graphical assets are actually being pushed to the async compute queue, owing to a much smaller amount of resources available. And we're not entirely sure what kind of requests from the engine are being given to the async compute queue, only that there's clearly some measure of rendering happening there. These tasks could be very efficient in the normal pipeline for NVIDIA, owing to a similar performance to AMD's hardware. Any number of things could be going on under the hood.
The official statement is that compute is a large part of the workload, and that they're able to offload more than just lighting to that particular queue. This could potentially help alleviate bottlenecks and shortcomings. What we actually see, however, is that async compute is barely used at this point. What's actually being rendered down there is unknown, but whatever it is, it's not very much of the total work output.
Similarly, CPU usage seems to be very mixed. Skylake is provide a great showing and certainly is not a limiting factor for any high-end GPU with this benchmark. Utilization, though, is far higher across the board than with Haswell-E, that has significantly more threads at its disposal. It's curious how the performance is actually largely the same. What's going on underneath is a mystery, though both are handling their work without issue with similar performance results. So at least there's that.
Now, this is just an analysis of a closed benchmark that isn't indicative of the behavior of the actual final build of the game. With more action on screen and more assets to light, render and make pretty, we could see an infusion of work in that compute queue, making good use of the async compute capabilities in DX12. At the moment, however, it's not actually there.
I'm excited for the final game, though, because the final product will indeed likely take advantage of the different DX12 components that we're all looking forward to. Async compute is one such technology that has the ability, if implemented fully and properly, to provide more complex scenes. Just imagine a resurgence of a texture compression algorithm similar to S3TC that can be done faster and more efficiently in the compute queue instead of in memory. We could have larger, and better looking textures without much of a performance hit. Not to mention all the other pretties we could have on screen.