PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection Paper • 2510.23594 • Published Oct 27 • 5
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing Paper • 2510.19808 • Published Oct 22 • 28
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17 • 89
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing Paper • 2503.12652 • Published Mar 16
GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing Paper • 2505.11493 • Published May 16 • 3
GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing Paper • 2505.11493 • Published May 16 • 3
GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing Paper • 2505.11493 • Published May 16 • 3 • 2
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts Paper • 2402.13220 • Published Feb 20, 2024 • 15 • 3
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs Paper • 2407.01509 • Published Jul 1, 2024
Understanding Alignment in Multimodal LLMs: A Comprehensive Study Paper • 2407.02477 • Published Jul 2, 2024 • 24
Understanding Alignment in Multimodal LLMs: A Comprehensive Study Paper • 2407.02477 • Published Jul 2, 2024 • 24
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts Paper • 2402.13220 • Published Feb 20, 2024 • 15
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14, 2024 • 129
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts Paper • 2402.13220 • Published Feb 20, 2024 • 15