Abstract
WorldCompass enhances long-horizon video-based world models through reinforcement learning post-training with clip-level rollouts, complementary rewards, and efficient RL algorithms.
This work presents WorldCompass, a novel Reinforcement Learning (RL) post-training framework for the long-horizon, interactive video-based world models, enabling them to explore the world more accurately and consistently based on interaction signals. To effectively "steer" the world model's exploration, we introduce three core innovations tailored to the autoregressive video generation paradigm: 1) Clip-level rollout Strategy: We generate and evaluate multiple samples at a single target clip, which significantly boosts rollout efficiency and provides fine-grained reward signals. 2) Complementary Reward Functions: We design reward functions for both interaction-following accuracy and visual quality, which provide direct supervision and effectively suppress reward-hacking behaviors. 3) Efficient RL Algorithm: We employ the negative-aware fine-tuning strategy coupled with various efficiency optimizations to efficiently and effectively enhance model capacity. Evaluations on the SoTA open-source world model, WorldPlay, demonstrate that WorldCompass significantly improves interaction accuracy and visual fidelity across various scenarios.
Community
WorldCompass: Reinforcement Learning for Long-Horizon World Models
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models (2026)
- Reward-Forcing: Autoregressive Video Generation with Reward Feedback (2026)
- From Absolute to Relative: Rethinking Reward Shaping in Group-Based Reinforcement Learning (2026)
- VideoCuRL: Video Curriculum Reinforcement Learning with Orthogonal Difficulty Decomposition (2025)
- CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning (2026)
- DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment (2026)
- Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper