DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Paper • 2601.22153 • Published 1 day ago • 49
SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer Paper • 2601.16515 • Published 8 days ago • 15
ActionMesh: Animated 3D Mesh Generation with Temporal 3D Diffusion Paper • 2601.16148 • Published 9 days ago • 12
Rethinking Video Generation Model for the Embodied World Paper • 2601.15282 • Published 10 days ago • 42
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper • 2601.16163 • Published 9 days ago • 13
PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models Paper • 2601.11087 • Published 15 days ago • 11
Transition Matching Distillation for Fast Video Generation Paper • 2601.09881 • Published 16 days ago • 32
Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering Paper • 2601.09697 • Published 17 days ago • 8
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices Paper • 2601.08303 • Published 18 days ago • 16
VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction Paper • 2601.05966 • Published 22 days ago • 23
VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control Paper • 2601.05138 • Published 23 days ago • 18
InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams Paper • 2601.02281 • Published 26 days ago • 33
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos Paper • 2601.00393 • Published 30 days ago • 130
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Paper • 2601.02204 • Published 26 days ago • 61
Self-Evaluation Unlocks Any-Step Text-to-Image Generation Paper • 2512.22374 • Published Dec 26, 2025 • 17
mHC: Manifold-Constrained Hyper-Connections Paper • 2512.24880 • Published about 1 month ago • 291