Towards Pixel-Level VLM Perception via Simple Points Prediction Paper • 2601.19228 • Published 25 days ago • 18
One-step Latent-free Image Generation with Pixel Mean Flows Paper • 2601.22158 • Published 22 days ago • 17
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published 29 days ago • 52
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Paper • 2601.15165 • Published about 1 month ago • 72
Emu3.5: Native Multimodal Models are World Learners Paper • 2510.26583 • Published Oct 30, 2025 • 111
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published Oct 13, 2025 • 166
google/siglip2-so400m-patch16-naflex Zero-Shot Image Classification • 1B • Updated Feb 21, 2025 • 1.25M • 58