CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published Apr 17 • 93
NaVILA: Legged Robot Vision-Language-Action Model for Navigation Paper • 2412.04453 • Published Dec 5, 2024
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos Paper • 2507.12440 • Published Jul 16
Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations Paper • 2508.18132 • Published Aug 25
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13 • 176
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17 • 89
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning Paper • 2510.15110 • Published Oct 16 • 15
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models Paper • 2406.01584 • Published Jun 3, 2024
WorldModelBench: Judging Video Generation Models As World Models Paper • 2502.20694 • Published Feb 28
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration Paper • 2511.21689 • Published 14 days ago • 100
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration Paper • 2511.21689 • Published 14 days ago • 100
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17 • 89
FasterViT: Fast Vision Transformers with Hierarchical Attention Paper • 2306.06189 • Published Jun 9, 2023 • 31
Adaptive Sharpness-Aware Pruning for Robust Sparse Networks Paper • 2306.14306 • Published Jun 25, 2023
Global Vision Transformer Pruning with Hessian-Aware Saliency Paper • 2110.04869 • Published Oct 10, 2021
RegionGPT: Towards Region Understanding Vision Language Model Paper • 2403.02330 • Published Mar 4, 2024 • 2