Collections
Discover the best community collections!
Collections including paper arxiv:2411.10440
-
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 130 -
ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?
Paper • 2411.06469 • Published • 17 -
Sharingan: Extract User Action Sequence from Desktop Recordings
Paper • 2411.08768 • Published • 9 -
AnimateAnything: Consistent and Controllable Animation for Video Generation
Paper • 2411.10836 • Published • 24
-
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 130 -
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
Paper • 2411.10640 • Published • 46 -
Knowledge Transfer Across Modalities with Natural Language Supervision
Paper • 2411.15611 • Published • 17 -
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability
Paper • 2411.19943 • Published • 63
-
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
Paper • 2411.02959 • Published • 70 -
GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details
Paper • 2411.03047 • Published • 9 -
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Paper • 2411.02336 • Published • 24 -
GenXD: Generating Any 3D and 4D Scenes
Paper • 2411.02319 • Published • 20
-
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 130 -
ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?
Paper • 2411.06469 • Published • 17 -
Sharingan: Extract User Action Sequence from Desktop Recordings
Paper • 2411.08768 • Published • 9 -
AnimateAnything: Consistent and Controllable Animation for Video Generation
Paper • 2411.10836 • Published • 24
-
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
Paper • 2411.02959 • Published • 70 -
GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details
Paper • 2411.03047 • Published • 9 -
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Paper • 2411.02336 • Published • 24 -
GenXD: Generating Any 3D and 4D Scenes
Paper • 2411.02319 • Published • 20
-
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 130 -
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
Paper • 2411.10640 • Published • 46 -
Knowledge Transfer Across Modalities with Natural Language Supervision
Paper • 2411.15611 • Published • 17 -
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability
Paper • 2411.19943 • Published • 63