Google outlines new methods for training robots with video and large language models

2024 is going to be a huge year for the cross-section of generative AI/large foundational models and robotics. There’s a lot of excitement swirling around the potential for various applications, ranging from learning to product design. Google’s DeepMind Robotics researchers are one of a number of teams exploring the space’s potential. In a blog post today, the team is highlighting ongoing research designed to give robotics a better understanding of precisely what it is we humans want out of them.

Traditionally, robots have focused on doing a singular task repeatedly for the course of their life. Single-purpose robots tend to be very good at that one thing, but even they run into difficulty when changes or errors are unintentionally introduced to the proceedings.

The newly announced AutoRT is designed to harness large foundational models, to a number of different ends. In a standard example given by the DeepMind team, the system begins by leveraging a Visual Language Model (VLM) for better situational awareness. AutoRT is capable of managing a fleet of robots working in tandem and equipped with cameras to get a layout of their environment and the object within it.

A large language model, meanwhile, suggests tasks that can be accomplished by the hardware, including its end effector. LLMs are understood by many to be the key to unlocking robotics that effectively understand more natural language commands, reducing the need for hard-coding skills.

The system has already been tested quite a bit over the past seven or so months. AutoRT is capable of orchestrating up to 20 robots at once and a total of 52 different devices. All told, DeepMind has collected some 77,000 trials, including more than 6,000 tasks.

Also new from the team is RT-Trajectory, which leverages video input for robotic learning. Plenty of teams are exploring the use of YouTube videos as a method to train robots at scale, but RT-Trajectory adds an interesting layer, overlaying a two-dimension sketch of the arm in action over the video.

The team notes, “these trajectories, in the form of RGB images, provide low-level, practical visual hints to the model as it learns its robot-control policies.”

Techcrunch event

San Francisco | October 27-29, 2025

REGISTER NOW

DeepMind says the training had double the success rate of its RT-2 training, at 63% compared to 29%, while testing 41 tasks.

“RT-Trajectory makes use of the rich robotic-motion information that is present in all robot datasets, but currently under-utilized,” the team notes. “RT-Trajectory not only represents another step along the road to building robots able to move with efficient accuracy in novel situations, but also unlocking knowledge from existing datasets.”

Topics

AI, DeepMind, Generative AI, Google, google deepmind robotics, llm, Robotics

Brian Heater

Brian Heater was the Hardware Editor at TechCrunch until early 2025. He has worked for a number of leading tech publications, including Engadget, PCMag, Laptop, and Tech Times, where he served as the Managing Editor. His writing has appeared in Spin, Wired, Playboy, Entertainment Weekly, The Onion, Boing Boing, Publishers Weekly, The Daily Beast and various other publications. He hosts the weekly Boing Boing interview podcast RiYL, has appeared as a regular NPR contributor and shares his Queens apartment with a rabbit named Juniper.

View Bio

Topics

More from TechCrunch

Google outlines new methods for training robots with video and large language models

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

AI recruiter Alex raises $17M to automate initial job interviews

Vibe-coding startup Anything nabs a $100M valuation after hitting $2M ARR in its first two weeks

The AI services transformation may be harder than VCs think

Famed roboticist says humanoid robot bubble is doomed to burst

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

Electronic Arts will reportedly be acquired for $50B

Spotify to label AI music, filter spam and more in AI policy change

It isn’t your imagination: Google Cloud is flooding the zone

Google outlines new methods for training robots with video and large language models

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

Most Popular

AI recruiter Alex raises $17M to automate initial job interviews

Vibe-coding startup Anything nabs a $100M valuation after hitting $2M ARR in its first two weeks

The AI services transformation may be harder than VCs think

Famed roboticist says humanoid robot bubble is doomed to burst

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

Electronic Arts will reportedly be acquired for $50B

Spotify to label AI music, filter spam and more in AI policy change

It isn’t your imagination: Google Cloud is flooding the zone