NVIDIA and OpenUSD at CES 2025

NVIDIA had a number of announcements at the recent CES 2025 and the OpenUSD team at NVIDIA had a great round up of announcements relevant to 3D aficionados. While my personal interest is more into animation, technologies related to 3D modeling always interest me due to their potential application. For example, can robot animation techniques be used in animated cartoons?

One of the big projects announced at CES 2025 was NVIDIA COSMOS, a collection of world foundation models, models and tools related to accelerating AI for supporting development of physical world solutions such as robotics and self-driving cars.

For example, given a few frames of a video, COSMOS can project forwards in time and generate potential future videos consistent with the starting frames. Why is this useful? Why not just  record more real world video? As a first example, consider training self-driving cars.

  • Imagine you have a video of a car accident and you want to train your self-driving car to avoid that accident in the future. You only have one video of the real accident. It is not feasible to create tens of variations of the accident just to train your car AI. So instead, take random snippets from the original video and then generate many variations moving forward from that point. They won’t all be identical, and that is the point. You end up with more variations without the cost of trying to capture them in real life.
  • Going a step further, imagine inserting random events into the generated content, such as a child running out on a road in front of a car. Generative AI can be used to generate such cases to improve the safety of self-driving cars.
  • There are also cool tokenization tools that convert images into tokens, which can then be stored and processed instead of starting from video each time. Not necessarily sexy, but very useful in practice.

Another scenario is in a factory. How to train robots to cope well with unexpected scenarios (such as spills or falling boxes), while ensuring the safety of human workers in the factory. The factory may not exist yet (it is still on the design board). This is where tools such as NVIDIA Omniverse can come in. Omniverse has a high quality rendering engine, driven from OpenUSD 3D assets. You can create a factory environment using OpenUSD and then render it out to simulate the forward camera on a forklift as it travels around the factory floor.

So what is special about OpenUSD?

  • OpenUSD natively supports variants, make it possible to have a single 3D model with many different renderings. This could be as simple as changing the color, changing the texture, or more advanced variations where different options are added or removed from a physical product.
  • Using Omniverse can be used to create high quality renders of scenes for the robots to learn from.
  • Further, because Omniverse has high quality real-time rendering, the rendering can be fed into simulated robots also in real time.

Hopefully it is clear that while training AI systems on real-life video is best, there are very real benefits in creating many different types of content so the AI system can learn from more examples at lower cost.

So, if you are a developer in the physical AI space, you might it beneficial to learn more about OpenUSD, especially if you want to use NVIDIA tools. This is where the Learn OpenUSD training series from NVIDIA may come in handy, a series of free training materials on OpenUSD. A number of lessons have already been developed, and more are in the pipelines.

Is OpenUSD that complex to need a whole series of lessons? Well, no… and yes! You can use OpenUSD very simply to create scenes. Just grab 3D objects and position them in a scene. That does not take long to learn. But OpenUSD has other features, like native support for variants. Want a set of human characters walking around a factory wearing different clothes to give the AI more to learn from? You don’t need to create new character models each time – you can instead create one model with many variants, allowing many more combinations to be created with relative ease. E.g. each set of clothes could be a new variant.

NVIDIA believe Physical AI will be the next big thing in AI. They are not alone in this belief, but I don’t think it will have the same type of impact as Large Language Models (LLMs). Anyone who can speak can potentially get benefit from direct interaction with a LLM. An LLM can help write an email, summarize a report, or perform sentiment analysis of your brand in social media posts. Physical AI I think will have a different impact. Fewer developers will touch physical AI directly, but I think it may have even more profound impacts on society with impacts on transportation, menial labor, or potentially dangerous physical jobs. Interesting days ahead for sure!

Hopefully you find the OpenUSD learning materials useful and sorry for not posting for a while. My promise / New Year’s resolution for you… I’ll be back!


Leave a comment