Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions Paper • 2507.05257 • Published Jul 7 • 14
ReasonMap Collection A fine-grained visual reasoning benchmark (We show more question types in the extension dataset.) • 3 items • Updated Oct 1 • 8
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning Paper • 2510.02240 • Published Oct 2 • 17
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published Nov 26, 2024 • 90
Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts Paper • 2506.10357 • Published Jun 12 • 21
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents Paper • 2506.14205 • Published Jun 17 • 7
Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search Paper • 2507.02652 • Published Jul 3 • 26
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps Paper • 2505.18675 • Published May 24 • 25