A new paper published by Facebook researchers just ahead of SIGGRAPH 2020 introduces neural supersampling, a machine learning-based upsampling approach not too dissimilar from NVIDIA's Deep Learning Super Sampling. However, neural supersampling does not require any proprietary hardware or software to run and its results are quite impressive as you can see in the example images, with researchers comparing them to the quality we've come to expect from DLSS.
Closest to our work, Nvidia has recently released deep-learned supersampling (DLSS) [Edelsten et al. 2019] that upsamples low-resolution rendered content with a neural network in real-time.
In this paper, we introduce a method that is easy to integrate with modern game engines, requires no special hardware (e.g., eye tracking) or software (e.g., proprietary drivers for DLSS), making it applicable to a wider variety of existing software platforms, acceleration hardware and displays.
We observed that, for neural supersampling, the additional auxiliary information provided by motion vectors proved particularly impactful. The motion vectors define geometric correspondences between pixels in sequential frames. In other words, each motion vector points to a subpixel location where a surface point visible in one frame could have appeared in the previous frame. These values are normally estimated by computer vision methods for photographic images, but such optical flow estimation algorithms are prone to errors. In contrast, the rendering engine can produce dense motion vectors directly, thereby giving a reliable, rich input for neural supersampling applied to rendered content.
Our method is built upon the above observations, and combines the additional auxiliary information with a novel spatio-temporal neural network design that is aimed at maximizing the image and video quality while delivering real-time performance.
At inference time, our neural network takes as input the rendering attributes (color, depth map and dense motion vectors per frame) of both current and multiple previous frames, rendered at a low resolution. The output of the network is a high-resolution color image corresponding to the current frame. The network is trained with supervised learning. At training time, a reference image that is rendered at the high resolution with anti-aliasing methods, paired with each low-resolution input frame, is provided as the target image for training optimization.
Obviously, Facebook has mentioned the potential application of neural supersampling in the context of AR and VR applications for its Oculus platform. However, we see no reason why this promising DLSS alternative couldn't be used in regular 3D games, and it's certainly something we'll keep an eye on as it develops.