Exclusive: Xbox One – Potential Impacts of DirectX 12, Asynchronous Compute and Hardware Specifications Explored; Compared with Sony’s PS4
Xbox One and PS4: Investigating Asynchronous Compute Capabilities
I have a confession to make. Calling these features "DirectX 12" hardware features is not technically accurate. These are hardware features that can be unlocked by any capable API. However since the vast majority of gamers knows them by this name - I decided to forgo technicality for convenience on this one. As I am sure everyone reading this is aware - Async caused quite a bit of noise recently. And that is because the impacts of this particular little feature are quite a lot.
Asynchronous Compute is all about using the compute pipeline of the graphic processing unit in conjunction with the graphics pipeline. This allows the usually idling compute resources of a card to come into effect - increasing the true utilization of a graphics processor and utilizing excess resources. Not to mention it offloads some of the load on the CPU allowing for more work to be issued (to the GPU).
In the case of the consoles, the capability to offload compute tasks to a significantly superior GPU (in terms of raw compute horsepower) is invaluable. Even though the CPU cores present in both consoles aren't really powerful cores - or clocked all that high for that matter, Asynchronous compute can allow developers more compute headroom without any noticeable trade offs.
Xbox One and Asynchronous Compute
Since this editorial is primarily about DirectX12, lets start with the Xbox One. Given above is the architectural diagram of the Xbox One GPU circuit. The first thing any person well versed in GPU architecture will notice that the Xbox One actually has two graphical queues. This is something that is not present in any other PC GPU (to my knowledge). Even the latest R9 Fury X has only one graphics queue. We will talk a bit more on it later. Secondly, readers will notice that there are actually two "Compute Command Processors" present in the diagram. Since we are dealing with the GCN (Graphics Core Next) architecture, these are of course the Asynchronous Compute Engines or ACEs for short.
You will remember that the Xbox One GPU is a Bonaire derivative - so this is pretty much as expected since Bonaire is based on the GCN 1.1 micro architecture which boasts 2 ACEs as well. Since each ACE can run 8 Compute Queues, the Xbox One can run a grand total of 16 Compute Queues [Update: Information present in a publication presented at Hot Chips suggests that Xbox's ACEs can manage upto 16 queues each for a total of 32 queues. We do not at this point know whether this is a typo or a feature thanks to AMD's custom solutions]. We know that the Xbox One has compute queues that developers can use to offload compute tasks to the GPU. The next question then becomes - can we access them and more importantly, do we require DX12 to do so? The answer to these questions is Yes and No respectively.
The Xbox One can implement Asynchronous Compute using 2 ACEs without the DX12 API
Take a look at the slide above from the HotChips presentation. As far as we know, the DirectX 12 patch hasn't been rolled out on the Xbox One yet. And yet, developers possess the capability to utilize the Xbox One's 'Compute Command Processors'. A recent example was the Tomb Raider demo - which utilized the ACEs to calculate the volumetric lighting used inside the game. The full length working paper can be found here.
This means that the Xbox One can successfully access Asynchronous Compute without requiring the DX12 API. So that is one less significant update that the new API could have brought. Now let us take a look at the PS4 GPU (Source: VGLeaks ).
PS4: Assessing Asynchronous Compute capabilities
We know that Pitcairn possesses exactly 2 ACEs (based on the GCN 1.0) design, but here is where things get interesting. The diagram of the PS4 is fairly detailed and very enlightening, you will notice there are a total of 8 Compute Pipelines (numbered 0 to 1). This means that the PS4 actually has 8 ACEs. This is 6 more than one would expect from an architecture based on Pitcairn and results in a grand total of 64 Compute Queues and a significant upgrade over the compute Asynchronous compute capabilities of the Xbox One. But what about the graphics queue? Well, the answer is 1+1 actually. On the right side of the diagram you will notice an "HP3D" pipeline and a "GFX" pipeline. Both can take graphic workloads but one (the HP3D) is dedicated completely for graphics processing while the other (GFX) can take both compute and graphics work loads in a serial fashion. So while the PS4 has 1 graphics queue by the traditional definition, it actually has a secondary pipeline (for a net total of 2, 3D capable pipelines) which supports graphics tasking as well.
The PS4 can implement Asynchronous Compute using 8 ACEs
So now that we know exactly how many compute queues and graphics queues each console has - just how much difference does it make? Well - the answer to that is pretty straightforward. If you recall the original DX12 editorial, you will remember I talked about the maximum theoretical potential of hardware. When talking about graphics, a single graphics queue is really all you need, since it will be able to task the available Control Units (we are talking about Stream Processors here) pretty well, two will help increase parallelism, and allow asynchronous tasks to be scheduled. However, increasing graphics queue will only help the programmer near the maximum theoretical potential of the CUs, they will not increase the actual maximum theoretical performance .
On the other hand ACEs are independent engines that allow compute tasks to be scheduled asynchronously and executed parallel to the graphics queue. This means that increasing ACEs actually result in more compute tasks that can be processed simultaneously with graphics. This also means that, in theory, the PS4 can simultaneously handle 4 [or 2 times, given the updated information is correct] times more Asynchronous Compute tasks than the Xbox One. Although once again - the theoretical hardware limit remains the same.
Stay in the loop
GET A DAILY DIGEST OF LATEST TECHNOLOGY NEWS
Straight to your inbox
Subscribe to our newsletter