A few hours ago, we covered a post which showed a new demo by Square Enix that was made using the latest DirectX 12 API from Microsoft and ran on the NVIDIA GeForce GTX Titan X. The demo showcased by Square Enix was based on the Luminous game engine which was displayed back in E3 2012 and has come a long way to fully adopt the DX12 API which will be available in games starting this year. The more important thing which needs to be covered about the demo is a new technology called, DirectX 12 Multiadapter which was demoed and the results are quite amazing.
DirectX 12 Multiadapter Technology Pushes Performance From Both Discrete and Integrated GPUs
During Microsoft's Build 2015 seminar, they showcased the new DirectX 12 "Multiadapter" feature which can be used to harness every dormant piece of silicon inside a PC. Nowadays, enthusiast PCs use beefy multi-GPU configurations which can add more performance to games but the entire portfolio of mainstream processors also feature a integrated graphics processing unit which is barely used by the masses. We can found integrated GPUs on mainstream CPUs, APUs and even laptops which have discrete graphics options. There have been technologies before which have tried to deliver a coherent link that enables to utilize both iGPU and discrete GPU such as Lucidlogix Virtu and AMD's own dual graphics options which works on most of their low end cards however, there was never a case where integrated GPUs were linked with high-performance graphics cards.
Microsoft's DirectX 12 Multiadapter technology will try to create a specialized coherent link between all GPUs available on the PC to be linked together to boost performance in games. The demo showcased by Microsoft was created by Square Enix and titled as “WITCH CHAPTER 0 [cry]” (weird name for a demo) and ran off four NVIDIA GeForce GTX Titan X graphics cards. The demo featured 63 million polygons per scene and 8K x 8K highly detailed textures showcasing next level of details. The DirectX 12 Mutliadpater while using both iGPU and dGPU also makes sure that multi-GPU configurations in Crossfire and SLI also work up to mark in addition to driver optimizations that are added by the manufacturer's in monthly updates.
Explicit DirectX 12 Multiadpater Control in Unreal Engine Elemental Demo
The next demonstration was made using the EPIC Games "Unreal Engine 4 - Elemental Demo" which was shown running under two scenarios. One with just a single adapter that made use of the Geforce GTX Titan X graphics cards and the other with a GeForce GTX Titan X running alongside an Intel iGPU that is featured on a mainstream processor using multiadapter link. Both scenarios were tasked to be put in a test to race until the 635th frame. Which ever configuration came out first won the scene and the multiadpater configuration was the obvious winner with an average 39.7 FPS compared to just 35.9 FPS on the single adapter link. You can find more details in the information provided by Microsoft below:
How does it work?
We recognized that most mixed GPU systems in the world were not making the most out of the hardware they had. So in our quest to maximize performance, we set out to enable separable and contiguous workloads to be executed in parallel on separate GPUs. One such example of separable workloads is postprocessing.
Virtually every game out there makes use of postprocessing to make your favorite games visually impressive; but that postprocessing work doesn’t come free. By offloading some of the postprocessing work to a second GPU, the first GPU is freed up to start on the next frame before it would have otherwise been able to improving your overall framerate.
Analyzing the results
The below image shows the real workload timeline of the Intel and NVIDIA GPUs (to scale). You can see how the workload that would normally be on the NVIDIA GPU is being executed on the Intel GPU instead.
In the demo, we even make use of DirectX 12’s Multiengine feature to complement the Multiadapter feature. You can see that the copy operations on NVIDIA’s copy engine execute in parallel with the NVIDIA 3D engine operations improving performance even more.
Taking a look at the Intel workload, we can see that we still have some unused GPU time that we can take advantage of. The final GPU utilization results were:
- NVIDIA: ~100% utilization on average
Intel: ~70% utilization on average via MSDN Blogs
DirectX 12 Mutliadapter technology sounds great since it will allow PC users to actually utilize iGPU which is currently the least important piece of hardware in any enthusiast and mainstream PC. CPUs, APUs with these iGPUs will be able to be utilized in a way that they can actually add performance benefits to gaming titles. DX12 can further allow several Multi-GPU configuration such as allowing CrossFire and SLI cards to work in harmony and even the VRAM of each GPU available on a PC to be combined and fully utilized. Only time will tell whether game developers and software makers will actually take benefit from the additional and useful technologies which DirectX 12 API will feature.
Update: New slides have been posted by worldsfactory which further detail the features we have listed. The slides show that the DirectX 12 API is complete and has working drivers. There are over 50% gamers who already own the hardware required to run DirectX 12 and Microsoft is expecting this to increase to 67% by holiday 2015 since they are offering 1 year free upgrade to Windows 10 from Windows 7 and Windows 8/8.1 based PCs. DirectX 12 will hold the fastest adoption rate since DirectX 9 and will have games developed by several developers on game engines that include Unreal 4, Unity 5, CryEngine 3 and Microsoft's own game studios. The first game developed with DirectX 12 will be from Chinese Snail Studios who are bringing King of Wushu in Q4 2015. The game underwent 6 weeks of engine porting to DirectX 12 from DirectX 11 and the final results are a 10% higher FPS improvement with even more performance to grab. The key points that Snail Games has mentioned include:
- DirectX 12 Parallelizes command list building and execution across CPUs
- Coordinates workload across multiple GPU engines
- Improves memory efficiency and reduces memory fragmentation
- Improves frame rate stability with object caching
- Enables direct hardware control and more game-specific optimizations
Talking more on the Explicit Multiadapter technology, it is mentioned that it can leverage all the hardware in a system through parallel command generation and execution, independent memory management and can allow multiplie GPU topologies such as Multiple discrete GPUs (Crossfire/SLI) and integrated plus discrete GPUs. This allows more control, more capability, more performance and supports custom load balancing off work offering two distinct API patterns, a Linked GPU pattern and an Unlinked GPU pattern.
The linked GPU pattern turns all the cards available on a system to be treated as a single GPU with multiple command processors per engine (3D/Compute/Copy) and memory regions. It utilizes resources from one GPU in the other linked GPU's rendering pipeline and command processor and memory regions are indicated by a node mask on the API. For unlinked GPUs such as iGPU and discrete class adapters, the pattern allows a coherent path with independent resource management that allows cross-adapter memory transactions and parallel queues with cross adapter synchronization. The last is the Heterogeneous multiadapter that can offload some post processing to the slower integrated GPU allowing the task to be completed fast while the discrete graphics card is used as a primary adapter. This confirms the DirectX 12 memory usage options we were talking about that can combine VRAM across multiple GPUs and it can further allow discrete GPUs to be linked together as a single large GPU with several command processors per engine (SMM and Streaming Engines).