NVIDIA Creates Interactive World with Its Deep Learning-Based AI Model: ‘It Wouldn’t Have Been Possible Before Tensor Cores’

Alessio Palumbo

Today's big NVIDIA announcement presented the TITAN RTX GPU. However, there is another interesting press release put out by the company in which they provide the first look at an interactive, AI rendered virtual world based on a deep learning model.

A team of researchers at NVIDIA used a neural network, previously trained on real-world videos, to render synthetic tridimensional environments in real time. The result is a basic driving game, as you can see below; the full demo will be showcased at the NeurIPS conference in Montreal, Canada.

Related Story Framework’s RTX 5070 12 GB Graphics Module Costs 72% Higher Than 8 GB Model

Bryan Catanzaro, vice president of Applied Deep Learning Research at NVIDIA and leader of the research team, said:

NVIDIA has been inventing new ways to generate interactive graphics for 25 years, and this is the first time we can do so with a neural network. Neural networks — specifically generative models — will change how graphics are created. This will enable developers to create new scenes at a fraction of the traditional cost.

One of the main obstacles developers face when creating virtual worlds, whether for game development, telepresence, or other applications is that creating the content is expensive. This method allows artists and developers to create at a much lower cost, by using AI that learns from the real world. Before Tensor Cores, this demo would not have been possible.

Clearly, the potential here is massive. The creation of massive virtual worlds is the basis of modern gaming and it is a highly time and resource consuming process. Being able to speed it up through AI would do wonders for game developers, but NVIDIA also expects applications in fields like virtual reality, automotive, robotics and architecture.

Those of you who aren't afraid to get really technical can dive into the entire research paper, available here. Anyone else can read the summary below, where the researchers have outlined a couple of current limitations of their model and how they could be overcome.

Conclusion

We present a general video-to-video synthesis framework based on conditional GANs. Through carefully-designed generator and discriminator networks as well as a spatiotemporal adversarial objective, we can synthesize high-resolution, photorealistic, and temporally consistent videos. Extensive experiments demonstrate that our results are significantly better than the results by state-of-the-art methods. Its extension to the future video prediction task also compares favorably against the competing approaches.

Limitations and future work

Although our approach outperforms previous methods, our model still fails in a couple of situations. For example, our model struggles in synthesizing turning cars due to insufficient information in label maps. We speculate that this could be potentially addressed by adding additional 3D cues, such as depth maps.

Furthermore, our model still can not guarantee that an object has a consistent appearance across the whole video. Occasionally, a car may change its color gradually. This issue might be alleviated if object tracking information is used to enforce that the same object shares the same appearance throughout the entire video. Finally, when we perform semantic manipulations such as turning trees into buildings, visible artifacts occasionally appear as building and trees have different label shapes. This might be resolved if we train our model with coarser semantic labels, as the trained model would be less sensitive to label shapes.

Alessio Palumbo Photo

About the author: With over two decades of experience in gaming journalism, Alessio Palumbo has led the gaming vertical at Wccftech since August 2015. He started working at a young age for Italian websites like Everyeye.it, Gamestar.it, Nextgame.it, and Multiplayer.it before kickstarting the indie English-language publication Worlds Factory as its founder and Editor in Chief. In the last decade, he has coordinated the overall output of Wccftech's gaming section, managed PR relations, assigned reviews, produced daily news coverage, edited gaming content as needed, and delivered game reviews. Arguably, his trademark content is the long series of exclusive developer interviews that have been cited by Wikipedia and by the biggest news media and gaming publications. His passion for technology also makes him knowledgeable when it comes to gaming hardware and tech. His favorite genres include RPGs, MMORPGs, and action/adventure games.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Button