Social media company Meta has now introduced V-JEPA 2, a new open-source AI model. This marks an improvement over its predecessor, the V-JEPA model, which Meta released last year. The prior version was trained on over one million hours of video footage, and Meta notes that V-JEPA 2 achieves a 30-fold speed advantage over Nvidia’s Cosmos model.
V-JEPA 2, described as a “world model,” aims to allow AI to learn, plan, and make decisions in a manner more aligned with human cognitive processes. These systems draw inspiration from the logical principles governing the physical world, constructing an internal simulation of reality. Meta, in an official statement, stated that the model can infer that a ball rolling off a surface will fall due to gravity, or that an object moved out of sight has not ceased to exist.
“Today, we’re excited to share V-JEPA 2, our state-of-the-art world model, trained on video, that enables robots and other AI agents to understand the physical world and predict how it will respond to their actions. These capabilities are essential to building AI agents that can think before they act, and V-JEPA 2 represents meaningful progress toward our ultimate goal of developing advanced machine intelligence (AMI),” the company announced in its statement.
“As humans, we have the ability to predict how the physical world will evolve in response to our actions or the actions of others. For example, you know that if you toss a tennis ball into the air, gravity will pull it back down. When you walk through an unfamiliar crowded area, you’re making moves toward our destination while also trying not to bump into people or obstacles along the path. When playing hockey, you skate to where the puck is going, not where it currently is. We achieve this physical intuition by observing the world around us and developing an internal model of it, which we can use to predict the outcomes of hypothetical actions,” it added.
This is a major breakthrough. Traditional robotics often requires vast amounts of meticulously labeled data and repetitive physical training to perform even relatively simple tasks. V-JEPA 2’s ability to reason in “latent” space and build an internal simulation of reality means robots could learn tasks with less empirical data, and even predict immediate actions. This could drastically reduce development costs and accelerate the deployment of robots into new, unfamiliar environments. A robot assistant that can predict how a human might move, or how an object might fall, will be safer and more helpful in shared environments.
Yann LeCun, Meta’s chief AI scientist, spoke about it during a video presentation at the Viva Technology conference in Paris on Wednesday. “Allowing machines to understand the physical world is very different from allowing them to understand language,” LeCun stated. He further elaborated, “A world model is like an abstract digital twin of reality that an AI can reference to understand the world and predict consequences of its actions and therefore it would be able to plan a course of action to accomplish a given task.”
The concept of “world models” has garnered considerable attention within the AI research community, and researchers are increasingly looking to develop AI systems with a deeper understanding of the physical world. Recent developments across the industry are proof of this – September 2024 saw AI researcher Fei-Fei Li secure $230 million in funding for a new venture named World Labs, which would work on developing “large world models” that can grasp the inherent structure of the physical world. Concurrently, Google’s DeepMind unit has been cultivating its world model, dubbed Genie, which possesses the capacity to simulate games and three-dimensional environments in real-time.
Content originally published on The Tech Media – Global technology news, latest gadget news and breaking tech news.