Modern AI models are strikingly human-like in their ability to generate text, audio, and video in response to prompts. But so far, these algorithms have mainly been relegated to the digital world rather than the physical three-dimensional world we live in. In fact, when trying to apply these models to the real world, even the most sophisticated algorithms struggle to perform properly. —For example, consider how difficult it is to develop safe and reliable self-driving cars. These models have artificial intelligence, but they simply don’t understand physics and often hallucinate and make unexplained mistakes.
However, this year is the year when AI will finally make the leap from the digital world to the real world in which we live. Extending AI beyond digital boundaries requires reimagining the way machines think and blending the digital intelligence of AI with the mechanical capabilities of robotics. This is what I call “physical intelligence,” a new form of intelligent machines that can understand dynamic environments, deal with unpredictability, and make decisions in real time. Unlike the models used in standard AI, physical intelligence is rooted in physics. In understanding fundamental principles of the real world, such as cause and effect.
Such features allow physical intelligence models to interact and adapt to different environments. My research group at MIT is developing a model of physical intelligence called a liquid network. For example, in one experiment, data captured by a human pilot was used to train two drones, one operated with a standard AI model and the other with a liquid network, over the course of a summer. We have located the object in the forest. Both drones performed equally well when tasked with doing exactly what they were trained to do, but when asked to locate objects in a variety of situations, including during winter and in urban environments. In that case, only the Liquid Network drone successfully completed its mission. The experiment showed that unlike traditional AI systems, which stop evolving after an initial training phase, liquid networks continue to learn and adapt from experience, just like humans.
Physical intelligence can also interpret and physically execute complex commands derived from text and images, bridging the gap between digital instructions and real-world execution. For example, in my lab, we can iteratively design and 3D print small robots based on prompts such as “a robot that can walk forward” or “a robot that can pick things up” and physically intelligent systems that can be printed in less than a minute. Developed in. Object”.
Other labs are also making great progress. For example, Covariant, a robotics startup founded by UC Berkeley researcher Pieter Abbeel, is developing a chatbot (similar to ChatGTP) that can control a robotic arm based on commands. They have already secured over $222 million to develop and deploy sorting robots in warehouses around the world. A team from Carnegie Mellon University also recently proven A robot with only one camera and imprecise motion can use a single neural network trained by reinforcement learning to jump onto obstacles twice its height or jump onto obstacles twice its length. This means you can perform dynamic and complex parkour moves, such as crossing gaps.
If 2023 was the year of text-to-image conversion and 2024 was the year of text-to-video conversion, 2025 will be the year of a new generation of devices, from robots to power grids to smart homes. This will be the era of physical intelligence. — that can interpret what we tell them and perform tasks in the real world.