Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2505.09568

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models

Paper • 2504.17789 • Published Apr 24 • 23
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

Paper • 2505.17589 • Published May 23 • 4

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97

ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems

Paper • 2503.20756 • Published Mar 26 • 7
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22 • 139

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14 • 18
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19 • 48

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5 • 74
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4 • 48
MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4 • 80
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Paper • 2506.03147 • Published Jun 3 • 58

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

Paper • 2505.11049 • Published May 16 • 60
Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20 • 134

Multimodal Approaches

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97

Image generation

Continuous Diffusion Model for Language Modeling

Paper • 2502.11564 • Published Feb 17 • 53
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

Paper • 2503.07027 • Published Mar 10 • 29
Efficient Generative Model Training via Embedded Representation Warmup

Paper • 2504.10188 • Published Apr 14 • 12
Improving Editability in Image Generation with Layer-wise Memory

Paper • 2505.01079 • Published May 2 • 29

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14 • 18
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19 • 48

Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models

Paper • 2504.17789 • Published Apr 24 • 23
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5 • 74
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4 • 48
MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4 • 80
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Paper • 2506.03147 • Published Jun 3 • 58

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

Paper • 2505.17589 • Published May 23 • 4

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

Paper • 2505.11049 • Published May 16 • 60
Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20 • 134

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97

Multimodal Approaches

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97

ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems

Paper • 2503.20756 • Published Mar 26 • 7
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22 • 139

Image generation

Continuous Diffusion Model for Language Modeling

Paper • 2502.11564 • Published Feb 17 • 53
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

Paper • 2503.07027 • Published Mar 10 • 29
Efficient Generative Model Training via Embedded Representation Warmup

Paper • 2504.10188 • Published Apr 14 • 12
Improving Editability in Image Generation with Layer-wise Memory

Paper • 2505.01079 • Published May 2 • 29

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs