-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2503.05379
-
openai/whisper-large-v3-turbo
Automatic Speech Recognition • 0.8B • Updated • 2.77M • • 2.76k -
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
Paper • 2503.05379 • Published • 38 -
Dia 1.6B
👯1.74kGenerate realistic dialogue from a script, using Dia!
-
lingtrain/chechen-russian
Viewer • Updated • 31.1k • 13 • 1
-
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
Paper • 2503.17352 • Published • 24 -
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Paper • 2412.05271 • Published • 159 -
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
Paper • 2503.05379 • Published • 38
-
RuCCoD: Towards Automated ICD Coding in Russian
Paper • 2502.21263 • Published • 133 -
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 122 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27
-
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper • 2503.00735 • Published • 23 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
Paper • 2503.05379 • Published • 38
-
Wan2.1
💻1.96kWan: Open and Advanced Large-Scale Video Generative Models
-
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
Paper • 2503.05379 • Published • 38 -
stabilityai/stable-virtual-camera
Image-to-Video • Updated • 13.2k • 222 -
3DAIGC/LHM
Image-to-3D • Updated • 17 • 21
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
Paper • 2503.17352 • Published • 24 -
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Paper • 2412.05271 • Published • 159 -
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
Paper • 2503.05379 • Published • 38
-
RuCCoD: Towards Automated ICD Coding in Russian
Paper • 2502.21263 • Published • 133 -
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 122 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27
-
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper • 2503.00735 • Published • 23 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
Paper • 2503.05379 • Published • 38
-
openai/whisper-large-v3-turbo
Automatic Speech Recognition • 0.8B • Updated • 2.77M • • 2.76k -
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
Paper • 2503.05379 • Published • 38 -
Dia 1.6B
👯1.74kGenerate realistic dialogue from a script, using Dia!
-
lingtrain/chechen-russian
Viewer • Updated • 31.1k • 13 • 1
-
Wan2.1
💻1.96kWan: Open and Advanced Large-Scale Video Generative Models
-
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
Paper • 2503.05379 • Published • 38 -
stabilityai/stable-virtual-camera
Image-to-Video • Updated • 13.2k • 222 -
3DAIGC/LHM
Image-to-3D • Updated • 17 • 21