Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.00522

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 1 hour ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 15
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices

Paper • 2312.16886 • Published Dec 28, 2023 • 22
Lenna: Language Enhanced Reasoning Detection Assistant

Paper • 2312.02433 • Published Dec 5, 2023 • 2

Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

Paper • 2403.02677 • Published Mar 5, 2024 • 18
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

Paper • 2403.03003 • Published Mar 5, 2024 • 11
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

Paper • 2403.01487 • Published Mar 3, 2024 • 16
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46

VisionLLM-ResearchPaper

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
Adversarial Diffusion Distillation

Paper • 2311.17042 • Published Nov 28, 2023 • 3

Video/Image/Gif/etc.

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 88
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27, 2024 • 195
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Paper • 2403.04692 • Published Mar 7, 2024 • 41

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Paper • 2402.19481 • Published Feb 29, 2024 • 22
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Paper • 2410.08261 • Published Oct 10, 2024 • 52

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 1 hour ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
Adversarial Diffusion Distillation

Paper • 2311.17042 • Published Nov 28, 2023 • 3

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 15
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices

Paper • 2312.16886 • Published Dec 28, 2023 • 22
Lenna: Language Enhanced Reasoning Detection Assistant

Paper • 2312.02433 • Published Dec 5, 2023 • 2

Video/Image/Gif/etc.

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 88
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27, 2024 • 195
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Paper • 2403.04692 • Published Mar 7, 2024 • 41

Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

Paper • 2403.02677 • Published Mar 5, 2024 • 18
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

Paper • 2403.03003 • Published Mar 5, 2024 • 11
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

Paper • 2403.01487 • Published Mar 3, 2024 • 16
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46

VisionLLM-ResearchPaper

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Paper • 2402.19481 • Published Feb 29, 2024 • 22
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Paper • 2410.08261 • Published Oct 10, 2024 • 52

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs