HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing Paper • 2602.03560 • Published 3 days ago • 40
Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization Paper • 2601.12993 • Published 18 days ago • 75
DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning Paper • 2508.05405 • Published Aug 7, 2025 • 64
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos Paper • 2507.15597 • Published Jul 21, 2025 • 34
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning Paper • 2505.12081 • Published May 17, 2025 • 18
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation Paper • 2306.13460 • Published Jun 23, 2023 • 2
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective Paper • 2402.14545 • Published Feb 22, 2024
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement Paper • 2503.06520 • Published Mar 9, 2025 • 11
Unveiling Visual Biases in Audio-Visual Localization Benchmarks Paper • 2409.06709 • Published Aug 25, 2024
TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM Paper • 2503.13377 • Published Mar 17, 2025 • 3
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published May 12, 2025 • 82