Visual instruction datasets for visual language models Collection Collections of multimodal (image+text) instruction finetuning datasets tailored for visual language models like LlaVA, Fuyu, or IDEFICS. • 5 items • Updated Nov 21, 2023 • 2
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published Nov 4, 2025 • 102
See the Text: From Tokenization to Visual Reading Paper • 2510.18840 • Published Oct 21, 2025 • 4 • 2
Vision-centric Token Compression in Large Language Model Paper • 2502.00791 • Published Feb 2, 2025 • 1
Vision-centric Token Compression in Large Language Model Paper • 2502.00791 • Published Feb 2, 2025 • 1
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Paper • 2504.06148 • Published Apr 8, 2025 • 13
Runtime error Featured 2.02k Chat With Janus-Pro-7B 🌍 2.02k A unified multimodal understanding and generation model.
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Paper • 2502.07870 • Published Feb 11, 2025 • 45