Scaling Spatial Intelligence with Multimodal Foundation Models Paper • 2511.13719 • Published Nov 17 • 45
DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models Paper • 2512.15713 • Published 9 days ago • 15
In Pursuit of Pixel Supervision for Visual Pre-training Paper • 2512.15715 • Published 9 days ago • 8