GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization Paper • 2511.15705 • Published 20 days ago • 91
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation Paper • 2510.18701 • Published Oct 21 • 66
Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published Mar 7 • 122
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published Sep 13, 2024 • 53
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 98
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment Paper • 2412.04814 • Published Dec 6, 2024 • 47