Google’s Veo 3 Just Did to Video What ChatGPT Did to Text | AIM


No one is cooking up innovations quite like Google. At I/O 2025, the search giant dropped a slew of announcements that left everyone stunned and questioning whether what they had witnessed was even real. 

Google CEO Sundar Pichai and DeepMind CEO Demis Hassabis showed no mercy to their rivals, firmly securing Google’s position in the lead of the AGI race.

The biggest buzz is around Google’s new video generation model, Veo 3. Not only does it create high-quality videos, but it also adds audio, a feature we haven’t seen before. Even OpenAI’s Sora lacks this feature. Other tools like Runway ML Gen-4, Meta’s MovieGen, Pika Labs, and Stability AI’s Stable Video 4D 2.0 do not support it either.

Veo 3 can generate the sound of traffic in the background of a city street scene, birds singing in a park, and even dialogue between characters.

“Veo 3 is the AGI moment for AI video,” quipped AI influencer Ashutosh Shrivastava on X. 

Social media platforms are flooded with clips generated by Veo 3, and the excitement shows no sign of slowing down. The model is surprisingly good at capturing real-world physics, from the noise and movement of water to the look and sound of walking in snow. It even handles lip-syncing with impressive accuracy.

One user on X posted a video imagining how Greek philosopher Pythagoras might have explained the Pythagorean theorem in ancient Greece. Another user shared a clip of a man performing a stand-up set, which, surprisingly, was actually funny.

Veo 3 is now available to Ultra subscribers in the US through the Gemini app and Flow, as well as to enterprise users via Vertex AI.

Filmmaking is Slated to Change Completely

The tech giant has introduced a new tool called Flow for filmmakers. This tool allows users to generate cinematic clips and scenes, integrate assets across shots, and reference creative elements in plain language. 

According to Google, Flow is inspired by what it feels like when time slows down and creation is effortless, iterative and full of possibility.

For decades, Steven Spielberg has been the gold standard in cinematic storytelling, known for blending emotional depth with visual spectacle in films like E.T., Jurassic Park, and Schindler’s List. If Veo 3 had existed in his early days, he might have been one of its early users.

Flow includes features such as camera controls, a scene builder for editing and extending existing shots, and asset management tools. A showcase section called Flow TV provides access to clips and channels generated with Veo, along with the exact prompts and techniques used, allowing users to “learn and adapt new styles”.

Experts and users alike are already imagining the future impact of Veo 3. 

Derya Unutmaz, professor at The Jackson Laboratory, believes AI could soon bring feature-length films to life at a fraction of the cost and time. “Soon we’ll have Toy Story quality feature-length films created with AI, possibly even using Veo 3 or near-future versions, in just a matter of days and for a few thousand dollars,” he said, adding that Toy Story originally cost $30 million and took four years to produce.

Meanwhile, a user on X called Google’s Veo 3 “more than crazy”, predicting that within two years, movies may start using AI instead of traditional CGI for shorter scenes. They added that this shift could accelerate quickly, potentially resulting in a big-budget film made almost entirely with AI, with humans still guiding the creative process.

Meanwhile, Google DeepMind is partnering with Primordial Soup, a new storytelling venture founded by director Darren Aronofsky. The goal is to explore how advanced video generation models can support more creative and emotionally rich storytelling.

As part of the partnership, Primordial Soup will produce three short films using DeepMind’s generative AI tools, including Veo. Each film will be directed by an emerging filmmaker, with Aronofsky providing mentorship and DeepMind’s research team offering technical support.

At the same time, Google is also expanding access to Lyria 2, offering musicians more tools to create music. 

Bye Bye Ghibli 

Google wasn’t finished yet. It also introduced Imagen 4, the latest version of its text-to-image model that combines speed with precision to produce strikingly detailed visuals. 

The new image generation model delivers remarkable clarity in fine textures like intricate fabrics, water droplets, and animal fur, while handling both photorealistic and abstract styles with ease.

Imagen 4 supports a wide range of aspect ratios and can generate images at up to 2K resolution, making it ideal for printing and presentations. It also shows significant improvements in spelling and typography, opening up new use cases like personalised greeting cards, posters, and comics.

The model is available today in the Gemini app, Whisk, Vertex AI and across Slides, Vids, Docs and more in Workspace. It will compete directly with OpenAI’s image generation model, which went viral recently after users flooded social media with Ghibli-style images.

Note: The headline has been updated for better clarity





Source link

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Stay Connected

0FansLike
0FollowersFollow
0SubscribersSubscribe
- Advertisement -spot_img

Latest Articles