Note: This post is a work in progress and will keep on updating itself over the next few hours.
Ladies and gents, Microsoft has finally released the much awaited DirectX 12 API.
Direct3D 12 Tech Details: Full Mantle Features, Multicore Scalibliity, Memory Management and More.
During the actual reveal, Microsoft talked a lot about how the CPU overhead is a major problem and how the performance of GPUs is going constantly uphill while the performance of CPUs is not increasing proportionally. They also correctly noted that single core performance of CPUs is increasing at a snails pace. The answer is ofcourse to put more load on the GPU so no bottleneck is established, and the way to do it, you guessed it right folks, the Mantle way.DX12’s main aim is to solve the major CPU overhead problem and believe it or not, it actually goes one step further than even Mantle. In an example that Microsoft showed, it is revealed that in DX11, core optimization was low. However in DX12 the workload is more spread and the actual work decreased as well. I have attached the benchmarks below courtesy of PCPER.
There are also much more subtle features such as memory management, multi core scalability swizzled resources and much deeper access controls. The net result is a potential gain of almost 20Gflops per frame (extra 4ms for the GPU). Microsoft also claims that the Xbox One will get a 20% boost per frame due to 20% more Gflops being available for use.
So apart from that, Directx 12 will support Fermi, Kepler and Maxwell which is basically all the current gen cards. This is only to be expected as Nvidia killed support for all DX10 based cards just a few days ago. So everything above the GTx 4xx mark is supported with the exception of the Geforce 405 GPU. All these GPUs will be tagged as “DX-12 Ready”.
They have also mentioned that DirectX12 will have much more optimization features such as advanced culling techniques and rasterization optimization. However there are concerns that such low level tools migh not be a walk in the park for low budget devs. For full details on Microsoft’s DirectX 12, you can read the following info:
Some of the questions that will pop up in your mind related to hardware and software support have already been answered by Microsoft over at their blog so we will post them here for your ease:
Q: Should I wait to buy a new PC or GPU?
A: No – if you buy a PC with supported graphics hardware (over 80% of gamer PCs currently being sold), you’ll be able to enjoy all the power of DirectX 12 games as soon as they are available.
Q: Does DirectX 12 include anything besides Direct3D 12?
A: Also new is a set of cutting-edge graphics tools for developers. Since this is a preview of DirectX 12 focused on Direct3D 12, other technologies may be previewed at a later date.
Q: When will I be able to get my hands on DirectX 12?
A: We are targeting Holiday 2015 games.
Q: What hardware will support Direct3D 12 / will my existing hardware support Direct3D 12?
A: We will link to our hardware partners’ websites as they announce their hardware support for Direct3D 12.
Where does this performance come from?
Direct3D 12 represents a significant departure from the Direct3D 11 programming model, allowing apps to go closer to the metal than ever before. We accomplished this by overhauling numerous areas of the API. We will provide an overview of three key areas: pipeline state representation, work submission, and resource access.
Pipeline state objects
Direct3D 11 allows pipeline state manipulation through a large set of orthogonal objects. For example, input assembler state, pixel shader state, rasterizer state, and output merger state are all independently modifiable. This provides a convenient, relatively high-level representation of the graphics pipeline, however it doesn’t map very well to modern hardware. This is primarily because there are often interdependencies between the various states. For example, many GPUs combine pixel shader and output merger state into a single hardware representation, but because the Direct3D 11 API allows these to be set separately, the driver cannot resolve things until it knows the state is finalized, which isn’t until draw time. This delays hardware state setup, which means extra overhead, and fewer maximum draw calls per frame.
Direct3D 12 addresses this issue by unifying much of the pipeline state into immutable pipeline state objects (PSOs), which are finalized on creation. This allows hardware and drivers to immediately convert the PSO into whatever hardware native instructions and state are required to execute GPU work. Which PSO is in use can still be changed dynamically, but to do so the hardware only needs to copy the minimal amount of pre-computed state directly to the hardware registers, rather than computing the hardware state on the fly. This means significantly reduced draw call overhead, and many more draw calls per frame.
Command lists and bundles
In Direct3D 11, all work submission is done via the immediate context, which represents a single stream of commands that go to the GPU. To achieve multithreaded scaling, games also have deferred contexts available to them, but like PSOs, deferred contexts also do not map perfectly to hardware, and so relatively little work can be done in them.
Direct3D 12 introduces a new model for work submission based on command lists that contain the entirety of information needed to execute a particular workload on the GPU. Each new command list contains information such as which PSO to use, what texture and buffer resources are needed, and the arguments to all draw calls. Because each command list is self-contained and inherits no state, the driver can pre-compute all necessary GPU commands up-front and in a free-threaded manner. The only serial process necessary is the final submission of command lists to the GPU via the command queue, which is a highly efficient process.
In addition to command lists, Direct3D 12 also introduces a second level of work pre-computation, bundles. Unlike command lists which are completely self-contained and typically constructed, submitted once, and discarded, bundles provide a form of state inheritance which permits reuse. For example, if a game wants to draw two character models with different textures, one approach is to record a command list with two sets of identical draw calls. But another approach is to “record” one bundle that draws a single character model, then “play back” the bundle twice on the command list using different resources. In the latter case, the driver only has to compute the appropriate instructions once, and creating the command list essentially amounts to two low-cost function calls.
Descriptor heaps and tables
Resource binding in Direct3D 11 is highly abstracted and convenient, but leaves many modern hardware capabilities underutilized. In Direct3D 11, games create “view” objects of resources, then bind those views to several “slots” at various shader stages in the pipeline. Shaders in turn read data from those explicit bind slots which are fixed at draw time. This model means that whenever a game wants to draw using different resources, it must re-bind different views to different slots, and call draw again. This is yet another case of overhead that can be eliminated by fully utilizing modern hardware capabilities.
Direct3D 12 changes the binding model to match modern hardware and significantly improve performance. Instead of requiring standalone resource views and explicit mapping to slots, Direct3D 12 provides a descriptor heap into which games create their various resource views. This provides a mechanism for the GPU to directly write the hardware-native resource description (descriptor) to memory up-front. To declare which resources are to be used by the pipeline for a particular draw call, games specify one or more descriptor tables which represent sub-ranges of the full descriptor heap. As the descriptor heap has already been populated with the appropriate hardware-specific descriptor data, changing descriptor tables is an extremely low-cost operation.
In addition to the improved performance offered by descriptor heaps and tables, Direct3D 12 also allows resources to be dynamically indexed in shaders, providing unprecedented flexibility and unlocking new rendering techniques. As an example, modern deferred rendering engines typically encode a material or object identifier of some kind to the intermediate g-buffer. In Direct3D 11, these engines must be careful to avoid using too many materials, as including too many in one g-buffer can significantly slow down the final render pass. With dynamically indexable resources, a scene with a thousand materials can be finalized just as quickly as one with only ten. Microsoft