Meta AI releases Joint Embedding Predictive World Model JEPA-WMs for physical planning

ME News message, on April 3 (UTC+8), the Meta AI Research team released the joint embedding predictive world model JEPA-WMs for physical planning and its related research. The study investigates the key factors behind the modelโ€™s success and provides a complete PyTorch implementation, dataset, and pretrained models. The released models include the core JEPA-WM, as well as the DINO-WM and V-JEPA-2-AC(fixed) models as baselines, covering multiple robotic manipulation and navigation environments such as DROID & RoboCasa, Metaworld, Push-T, PointMaze, and Wall. The models use visual encoders such as DINOv3 ViT-L/16, DINOv2 ViT-S/14, and V-JEPA-2 ViT-G/16, with input image resolutions mainly 224ร—224 or 256ร—256. The project also provides an optional VM2M decoder head for visualization and trajectory decoding, but emphasizes that this decoder is not necessary for training a world model or conducting planning evaluations. All resources have been made public on GitHub, Hugging Face, and arXiv. (Source: InFoQ)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments