NVIDIA DLSS 2.0 Behind the Scenes – How the Magic Happens
NVIDIA announced DLSS 2.0 with much fanfare in late March, showcasing the revamped deep learning based image reconstruction technology in games like Control and MechWarrior 5: Mercenaries.
The first implementation of Deep Learning Super Sampling had relatively mixed results. While there were some good implementations, such as Control itself which previously used a CUDA-based version of DLSS, plenty of others had issues and introduced blurriness for example.
DLSS 2.0, on the other hand, has been extremely well received so far in all of the four games where it's available (Deliver Us The Moon and Wolfenstein: Youngblood, in the addition to the two mentioned above). What changed, then?
Luckily, for those who are really curious about what goes on behind the scenes, NVIDIA's Senior Research Scientist Edwin Liu explained exactly what makes NVIDIA DLSS 2.0 such a big upgrade over the previous version in a GTC 2020 talk that was published recently on NVIDIA's website. It's an extremely interesting presentation, though fairly long (almost 48 minutes). We've summarized the salient points here for your convenience.
There are two possible approaches to image super-resolution rendering: single-image and multi-frame. The former is perhaps more commonly used and known, as it is usually achieved through mere interpolation filters like bicubic, bilinear, lanczos, spline, nearest neighbor et cetera. That's what happens with most video upscaling software.
Lately, though, deep neural networks have been used to 'hallucinate' new pixels after training. An example of this is ESRGAN, the Enhanced Super-Resolution Generative Adversarial Networks model on which several AI-enhanced texture packs for older games are based on.
The images reconstructed through interpolation filters are, simply put, lacking too much detail compared to the native image. DNNs do a better job in this regard, but since they are 'hallucinating' the new pixels, the result may be inconsistent with the native image.
As you can see in the comparison below, the left image features vastly different tree bushes compared to the native image. NVIDIA deems this unacceptable for DLSS, as the goal is to stay as close as possible to the 'ground truth' image as well as the original creative vision of the game developer.
Another major issue is that the resulting image is often temporally unstable, featuring flickering for instance. In order to solve all these issues, NVIDIA opted for a multi-frame super-resolution approach when it comes to DLSS 2.0. This allows accumulating multiple low-resolution frames to craft a high-resolution frame, which makes restoring true details much easier.
This is what happens with spatial-temporal upsampling techniques like the now ubiquitous Temporal Antialiasing and Checkerboard Rendering, with the latter known to be a popular choice among developers for PlayStation 4 Pro and Xbox One X consoles.
Since these reconstruction techniques use samples from multiple frames, they're much less likely to encounter temporal instability issues like flickering. Additionally, while for each frame the shading rate is kept low to achieve strong performance, the effective sampling rate is drastically increased due to the multi-frame reconstruction approach.
That is not to say there are no issues with TAA or CR. Since there are content changes when rendering games in real-time (the scenes are continuously dynamic, after all), naively assuming previous frames will be correct could easily lead to artifacts like ghosting or lagging. Usually, these problems are handled with heuristics based history rectification of the invalid samples from previous frames. That comes with other issues of its own, though, such as the reintroduction of temporal instability, blurriness, and Moiré patterns.
One of the most commonly used heuristics models is neighborhood clamping, which works by clamping samples from previous frames to a minimum and maximum of the neighboring current frame samples. This does strike a decent balance to avoid the shortcomings mentioned above. However, it cannot prevent a rather significant loss of detail as you can see below.
NVIDIA is solving this issue by exploiting the power of its supercomputer, which is trained offline on tens of thousands of extremely high-quality images.
Neural networks are simply much more suited to a task like this than handcrafted heuristics as they can find the optimal strategy to combine samples collected over multiple frames, delivering much higher quality reconstructions in the end result.
It's a data driven approach and one that allows DLSS 2.0 to successfully reconstruct even complex situations like those with the Moiré pattern. The image comparisons below are thoroughly impressive, surpassing even native images more often than not while doing a 4x upscaling from 540p to 1080p. None of this was possible with the previous DLSS model.
Of course, NVIDIA's ultimate goal with DLSS 2.0 is to deliver great performance. And it certainly does, with very minimal overhead. The added cost is almost trivial, as it takes just 1.5 milliseconds for an RTX 2080Ti graphics card while rendering at 4K resolution.
Just as exciting is that adding DLSS 2.0 will be much easier for game developers. Not only because there's an Unreal Engine 4 branch now and the generalized neural model doesn't have to be trained per game anymore, but mostly because the multi-frame approach means it will be very easy to implement DLSS 2.0 in those games and engines that already support Temporal Antialiasing (TAA), and there are lots of them.
Needless to say, we can't wait to see where NVIDIA takes its incredibly promising Deep Learning Super Sampling image reconstruction technique in the future. They're already working on exposing the sharpness setting, for one thing, and having more options is always great.