Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.19412

about 1 month ago

MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training

Paper • 2501.07556 • Published Jan 13 • 7
MINIMA: Modality Invariant Image Matching

Paper • 2412.19412 • Published Dec 27, 2024 • 4
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

Paper • 2405.12979 • Published May 21, 2024 • 12

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published Jan 14 • 63
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

Paper • 2501.09755 • Published Jan 16 • 36
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16 • 71
MINIMA: Modality Invariant Image Matching

Paper • 2412.19412 • Published Dec 27, 2024 • 4

about 7 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

SuperPoint: Self-Supervised Interest Point Detection and Description

Paper • 1712.07629 • Published Dec 20, 2017
D2-Net: A Trainable CNN for Joint Detection and Description of Local Features

Paper • 1905.03561 • Published May 9, 2019
R2D2: Repeatable and Reliable Detector and Descriptor

Paper • 1906.06195 • Published Jun 14, 2019
SuperGlue: Learning Feature Matching with Graph Neural Networks

Paper • 1911.11763 • Published Nov 26, 2019

Image-Video General Tasks

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7 • 47
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published Jan 7 • 52
An Empirical Study of Autoregressive Pre-training from Videos

Paper • 2501.05453 • Published Jan 9 • 41
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training

Paper • 2501.07556 • Published Jan 13 • 7

about 1 month ago

MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training

Paper • 2501.07556 • Published Jan 13 • 7
MINIMA: Modality Invariant Image Matching

Paper • 2412.19412 • Published Dec 27, 2024 • 4
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

Paper • 2405.12979 • Published May 21, 2024 • 12

SuperPoint: Self-Supervised Interest Point Detection and Description

Paper • 1712.07629 • Published Dec 20, 2017
D2-Net: A Trainable CNN for Joint Detection and Description of Local Features

Paper • 1905.03561 • Published May 9, 2019
R2D2: Repeatable and Reliable Detector and Descriptor

Paper • 1906.06195 • Published Jun 14, 2019
SuperGlue: Learning Feature Matching with Graph Neural Networks

Paper • 1911.11763 • Published Nov 26, 2019

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published Jan 14 • 63
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

Paper • 2501.09755 • Published Jan 16 • 36
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16 • 71
MINIMA: Modality Invariant Image Matching

Paper • 2412.19412 • Published Dec 27, 2024 • 4

Image-Video General Tasks

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7 • 47
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published Jan 7 • 52
An Empirical Study of Autoregressive Pre-training from Videos

Paper • 2501.05453 • Published Jan 9 • 41
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training

Paper • 2501.07556 • Published Jan 13 • 7

about 7 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs