NVIDIA has introduced DreamGen, a new engine designed to scale robot learning using synthetic video data instead of relying on large fleets of human operators. The technology generates extensive neural trajectories, photorealistic robot videos paired with motor action labels, to train robots on new tasks and environments without physical trial.
DreamGen works by fine-tuning state-of-the-art video generative models on specific robots, prompting them with language commands to simulate new scenarios, extracting pseudo-actions, and then training robot models on this augmented dataset.
What if robots could dream inside a video generative model? Introducing DreamGen, a new engine that scales up robot learning not with fleets of human operators, but with digital dreams in pixels. DreamGen produces massive volumes of neural trajectories – photorealistic robot… pic.twitter.com/EqAScBuh2N
— Jim Fan (@DrJimFan) May 20, 2025
According to Jim Fan, co-lead of the project GR00T & GEAR Lab at NVIDIA, DreamGen enables robots “to dream inside a video generative model,” producing “massive volumes of neural trajectories” that unlock “strong generalisation to new nouns, verbs, and environments.”
Build Reliable GenAI Interactions Consistent, Accurate and Predictable Models
Get Demo
The engine has demonstrated the ability to teach a humanoid robot 22 new behaviours, including pouring, folding, and hammering, without prior direct training on these tasks. It also showed zero-to-one generalisation, achieving over 43% success on novel tasks and 28% in unseen environments.
Unlike traditional graphics engines, DreamGen handles complex physical interactions such as deformable objects and fluids simply by running a diffusion neural network.
Joel Jang, research scientist at NVIDIA GEAR Lab, noted that current robot data scaling relies heavily on human labour, often requiring physical deployment in numerous homes. DreamGen bypasses this by generating “Dreams” or synthetic training data through a “simple 4-step pipeline.”
He said the technology trains robots to perform “22 new verbs in 10 unseen environments and train robots to perform these tasks ‘zero-shot’,” including tasks hard to simulate like folding or scooping.
Through DreamGen, we generate “Dreams” or 𝑁𝑒𝑢𝑟𝑎𝑙 𝑇𝑟𝑎𝑗𝑒𝑐𝑡𝑜𝑟𝑖𝑒𝑠 of 22 new verbs in 10 unseen environments, and train robots to perform these tasks “zero-shot”. pic.twitter.com/4y97jkPLOS
— Joel Jang (@jang_yoel) May 20, 2025
The research, which took eight months, was led by a team collaborating with NVIDIA’s GEAR Lab, Cosmos Team, and the University of Washington. DreamGen also introduces a new benchmark, DreamGen Bench, aimed at video model researchers to advance robotics without requiring physical robots.
DreamGen marks a significant step toward robots learning diverse skills from digital data, potentially reducing the need for costly and time-consuming manual data collection.