One of the main new features introduced with the GeForce RTX 4000 Series graphics cards is SER, shorthand for Shader Execution Reordering. With ray tracing slowly but surely becoming more ubiquitous, SER aims to improve shader efficiency by mitigating execution and data divergence. As suggested by its name, it achieves this result by reordering threads in real time to enhance coherence. It also decouples ray intersection and shading operations.
NVIDIA confirmed to Wccftech in a brief Q&A that SER (just like Opacity Micro-Maps and Displaced Micro-Mesh, the other two performance enhancements) requires explicit developer integration in a game and comes with its own API extension of NVAPI. Luckily, the NvRTX Unreal Engine 5 branch will soon be updated to version 5.0.3, which adds support for SER in the most popular next-generation engine. According to NVIDIA, Shader Execution Reordering can deliver up to 40% frame rate optimization in ray tracing operations without impacting quality.
Interestingly, SER also improves UE5 Lumen's performance when hardware ray tracing is enabled. NVIDIA provided three example use cases. Path tracing is the first and most simple, where performance gains can go between 20 and 50 percent.
Path tracing presents a highly divergent workflow, making it a great candidate for applying SER. Applying SER allows the path tracer to reduce divergence in its material evaluation instead of just on the number of bounces.
SER can be useful beyond just rearranging shaders to reduce divergence. For example, NVIDIA said work compaction can yield meaningful benefits when using Lumen's hardware ray traced Global Illumination.
For large scenes, like the UE5 City Sample, traces are broken into the near and far field, which are run as separate tracing passes with compaction in between. The multiple passes and compaction can be replaced by a single NVReorderThread call. This avoids the idle bubbles on the GPU required to compact the results of near-field tracing and then launch far-field rays.
Removing the extra overhead of storing, compacting, and relaunching work is often worth a 20% savings. The shader changes can be more intensive due to assumptions in the original code (functions using macros to permute behaviors rather than arguments). However, the logical changes amounted to adding two reorder calls with a single Boolean expression for whether a trace had hit or missed.
Lastly, NVIDIA detailed a third (albeit more complex) use case with Lumen's hardware ray traced reflections. There are two different ray tracing pipelines typically at work here, one for near and far field tracing and another for hit lighting.
With SER enabled, the passes can be combined because separate compaction and sorting phases are no longer necessary. The pass roughly becomes trace near field, if not a hit trace far field, if either hit then uses the hit object to evaluate the material and perform lighting. This is possible due to the decoupling of tracing and shading.
The implementation described above resulted in a 20-30% speed increase in Lumen reflections on the GPU, measured when profiling a typical workload in UE5 City Sample.
If you're a developer interested in adding SER support to your engine, you may want to look at the full whitepaper. It is currently unclear which games will support Shader Execution Reordering (or Opacity Micro-Maps and Displaced Micro-Mesh, for that matter) in the near future, but we'll investigate with both NVIDIA and game studios.