AMD Improves DirectX 12 Performance By Up To 46% With Asynchronous Compute Engines


AMD Asynchronous Compute Engines in GCN based GPUs can be used to leverage DX12's Asynchronous Shaders feature, improving performance by up to 46%. These engines inside AMD's Graphics Core Next based GPUs are dubbed ACEs for short. And they're responsible for handing out several tasks simultaneously to the compute units inside a GPU.

Asynchronous Shaders or what's otherwise known as Asynchronous Shading is a new feature in DirectX12, Mantle and Vulkan that was previously unavailable in DirectX11 and OpenGL. This feature allows tasks to be submitted and processed by shader units inside GPUs ( what Nvidia calls CUDA cores and AMD dubs Stream Processors ) simultaneous and asynchronously in a multi-threaded fashion.

 AMD Improves DX12 Performance by 46% With Asynchronous Compute Engines

One would've thought that with multiple thousands of shader units inside modern GPUs that proper multi-threading support would have already existed in DX11. In fact one would argue that comprehensive multi-threading is crucial to maximize performance and minimize latency. But the truth is that DX11 only supports basic multi-threading methods that can't fully take advantage of the thousands of shader units inside modern GPUs. This meant that GPUs could never reach their full potential, until now.

Multithreaded graphics in DX11 does not allow for multiple tasks to be scheduled simultaneously without adding considerable complexity to the design. This meant that a great number of GPU resources would spend their time idling with no task to process because the command stream simply can't keep up. This in turn meant that GPUs could never be fully utilized, leaving a deep well of untapped performance and potential that programmers could not reach.

Other complementary technologies attempted to improve the situation by enabling prioritization of important tasks over others. Graphics pre-emption allowed for prioritizing tasks but just like multi-threaded graphics in DX11 it did not solve the fundamental problem. As it could not enable multiple tasks to be handled and submitted simultaneously independently of one another. A crude analogy would be that what graphics pre-emption does is merely add a traffic light to the road rather than add an additional lane.

Out of this problem a solution was born, one that's very effective and readily available to programmers with DX12, Vulkan and Mantle. It's called Asynchronous Shaders and just as we've explained above it enables a genuine multi-threaded approach to graphics. It allows for tasks to be simultaneously processed independently of one another. So that each one of the multiple thousand shader units inside a modern GPU can be put to as much use as possible to improve performance.

However to enable this feature the GPU must be built from the ground up to support it. In AMD's Graphics Core Next based GPUs this feature is enabled through the Asynchronous Compute Engines integrated into each GPU. These are structures which are built directly into the GPU itself. And they serve as the multi-lane highway by which tasks are delivered to the stream processors.

Each ACE is capable of handling eight queues and every GCN based GPU has a minimum of two ACEs. More modern chips such as the R9 285 and R9 290/290X have eight ACEs. ACEs debuted with AMD's first GCN based GPU code named Tahiti in late 2011. They were originally added to GPUs mainly to handle compute tasks because they could not be leveraged with graphics APIs of the time. Today however ACEs take on a more important role in graphics processing in addition to compute.

AMD Asynchronous Compute Engine ACE
To quantify the performance advantage that this brings AMD used a LiquidVR demo. The demo ran at 245 FPS with Asynchronous Shaders off and post-processing disabled. However after post-processing was enabled the performance dropeed to 158 FPS. Finally when Asynchronous Shaders and post-processing were both enabled, the average FPS went up to 230 FPS, approximately a 46% performance uplift.

This isn't all just a theoretical exercise either, there's a number of games which have already been released with Asynchronous Shaders implemented. These games include Battlefield 4, Infamous Second Son and The Tomorrow Children on the PS4 and Thief on the PC. Obviously AMD made it a point to mention both PS4 and PC games because both platforms sport its GCN graphics architecture. So whatever is achieved on one platform the company can easily be take to the other.

Naturally this demo only showcases the potential performance improvement that can be attained with Asynchronous Shaders and low level APIs such as Mantle, Vulkan and DX12. With a well designed implementation and proper optimization we may see DX12 games approach that performance uplift figure just from Async Shaders. Which is a very exciting prospect.