NVIDIA Calls Cosmos 3 The World’s First Fully Open Omnimodel, As Robots And Autonomous Vehicles Get A Powerful Brain Grounded In Physics

Hassan Mujtaba
A white futuristic car with a 'COSMOS3' license plate is displayed alongside a humanoid robot against a galaxy backdrop, with five tech-themed icons in the foreground.

NVIDIA has just announced its Cosmos 3 world model at the ongoing GTC Taipei, giving us a glimpse at what it calls the world's first "fully open omnimodel" that is capable of vision-based reasoning, while supporting multimodal output in the form of text, image, video, and ambient sound.

NVIDIA's Cosmos 3 "pairs a reasoning transformer with an expert generation transformer," allowing the model to grasp physical interactions before generating video and action content that leverages those interactions

At its heart, the Cosmos 3 tackles the challenge of making robots, autonomous vehicles (AVs), and vision agents understand their surroundings in an environment where training data is limited and simulation stacks remain fragmented.

Related Story Open-Source NVIDIA NVK Vulkan Driver Receives DLSS Support In Mesa 26.2

NVIDIA's Cosmos 3 is an open omnimodel, which means it is able to "natively understand and generate text, images, video, ambient sound and actions with leading physics accuracy."

Its unique strength lies in it's architecture, which pairs reasoning transformers with those geared towards generation, "enabling Cosmos 3 to understand object interactions, motion and spatial-temporal relationships before generating video and action trajectories."

For the benefit of those who might not be aware, an AI transformer is basically a deep learning neural network that tracks relationships and context within sequential data, which might include words in a sentence. These networks can substantially speed up output generation by undertaking parallel processing, where a given data sequence is analyzed simultaneously instead of piece-by-piece.

Coming back, according to NVIDIA, you can use the Cosmos 3 as a:

  1. Vision language model
  2. World model that simulates physical environments and predicts future world states
  3. Foundation for other world models

Finally, do note that Cosmos 3 Super, which has the highest-fidelity responses, and Cosmos 3 Nano are available right now, with Cosmos 3 Edge coming soon for real-time inference, that too, geared towards edge devices.

Hassan Mujtaba Photo

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Button