Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI. learn more
meta I made some important announcement This week we’re looking at robotics and embedded AI systems. This includes releasing benchmarks and artifacts to better understand and interact with the physical world. The three research products released by Meta, Sparsh, Digit 360, and Digit Plexus, focus on touch perception, robot dexterity, and human-robot interaction. Meta also releases PARTNR, a new benchmark for assessing planning and reasoning in human-robot collaboration.
The release comes as advances in fundamental models have rekindled interest in robotics and AI companies are gradually expanding their competition from the digital realm into the physical world.
There are new expectations in the industry that robots will be able to perform more complex tasks that require reasoning and planning with the help of foundational models such as large-scale language models (LLMs) and vision language models (VLMs). Masu.
tactile sense
Sparshis a family of encoder models for vision-based tactile sensing created in collaboration with the University of Washington and Carnegie Mellon University. The purpose is to provide touch recognition functionality to robots. Touch recognition is critical for robotics tasks, such as determining how much pressure can be applied to a particular object to avoid damage.
A classic approach to incorporating vision-based tactile sensors into robotic tasks is to use labeled data to train custom models that can predict useful states. This approach does not generalize across different sensors and tasks.
Meta describes Sparsh as a general-purpose model that can be applied to different types of vision-based tactile sensors and different tasks. To overcome the challenges faced by previous generations of touch perception models, researchers trained the Sparsh model through self-supervised learning (SSL). This eliminates the need for labeled data. The model is trained on over 460,000 haptic images integrated from various datasets. According to researchers’ experiments, Sparsh achieved an average improvement of 95.1% compared to task-specific and sensor-specific end-to-end models under a limited labeled data budget. Researchers have created different versions of Sparsh based on different architectures, such as Meta’s I-JEPA model and DINO model.
touch sensor
In addition to leveraging existing data, Meta is also releasing hardware to collect rich tactile information from physical objects. digit 360 is an artificial finger-shaped tactile sensor with over 18 sensing functions. The sensor is equipped with more than 8 million taxels to capture granular deformations in all directions on the fingertip surface. Digit 360 captures a variety of sensing modalities to help you better understand the interaction of objects with your environment.
Digit 360 also has an on-device AI model that reduces dependence on cloud-based servers. This allows it to process information locally and respond to touch with minimal delay, similar to reflex arcs in humans and animals.
“Beyond advances in robotic dexterity, this breakthrough sensor has important potential applications ranging from medicine and prosthetics to virtual reality and telepresence,” Meta researchers wrote. .
The meta is code and design Digit 360 fosters community-driven research and innovation in touch perception. But as with any open source model release, there’s a lot to gain from the hardware and potential adoption of the model. Researchers say the information captured by Digit 360 could help develop more realistic virtual environments and could have a big impact on Meta’s Metaverse project in the future. I’m thinking.
Meta will also release Digit Plexus, a hardware and software platform aimed at facilitating the development of robotic applications. Digit Plexus integrates various fingertip and skin tactile sensors into one robotic hand, and the tactile data collected from the sensors can be encoded and sent to a host computer via a single cable. Meta is releasing: code and design The use of Digit Plexus, which allows researchers to build on the platform and advance research into robotic dexterity.
Meta plans to partner with tactile sensor maker GelSight Inc. to manufacture Digit 360. It will also partner with Korean robotics company Wonik Robotics to develop a fully integrated robotic hand with tactile sensors on the Digit Plexus platform.
Evaluation of human-robot collaboration
Meta also releases collaborative planning and reasoning tasks between humans and robots (Part NR), a benchmark for evaluating the effectiveness of AI models when collaborating with humans in housework.
PARTNR is built on top of Habitat, Meta’s simulation environment. 60 houses contain 100,000 natural language tasks and over 5,800 unique objects. This benchmark is designed to evaluate the performance of LLM and VLM under human direction.
Meta’s new benchmark joins a growing number of projects considering the use of LLM and VLM in robotics and embedded AI settings. Over the past year, these models have shown great promise in serving as planning and reasoning modules for robots in complex tasks. Startups such as Figure and Covariant have developed prototypes that use foundational models for planning. At the same time, the AI Lab is working to create better foundational models for robotics. One example is Google DeepMind’s RT-X project. This project brings together datasets from different robots to train a Vision Language Action (VLA) model that generalizes to different robot morphologies and tasks.