-
Controllable Text Generation for Large Language Models: A Survey
Paper ⢠2408.12599 ⢠Published ⢠65 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper ⢠2408.12590 ⢠Published ⢠36 -
Real-Time Video Generation with Pyramid Attention Broadcast
Paper ⢠2408.12588 ⢠Published ⢠17 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper ⢠2408.11039 ⢠Published ⢠63
Collections
Discover the best community collections!
Collections including paper arxiv:2409.13591
-
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper ⢠2402.17485 ⢠Published ⢠195 -
VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior
Paper ⢠2312.01841 ⢠Published ⢠1 -
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Paper ⢠2311.16498 ⢠Published ⢠1 -
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Paper ⢠2312.02134 ⢠Published ⢠2
-
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
Paper ⢠2401.09416 ⢠Published ⢠11 -
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
Paper ⢠2401.10171 ⢠Published ⢠14 -
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model
Paper ⢠2311.09217 ⢠Published ⢠22 -
GALA: Generating Animatable Layered Assets from a Single Scan
Paper ⢠2401.12979 ⢠Published ⢠9
-
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models
Paper ⢠2405.16537 ⢠Published ⢠17 -
ReVideo: Remake a Video with Motion and Content Control
Paper ⢠2405.13865 ⢠Published ⢠25 -
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models
Paper ⢠2406.16863 ⢠Published ⢠11 -
Portrait Video Editing Empowered by Multimodal Generative Priors
Paper ⢠2409.13591 ⢠Published ⢠17
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper ⢠2402.04252 ⢠Published ⢠29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper ⢠2402.03749 ⢠Published ⢠14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper ⢠2402.04615 ⢠Published ⢠44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper ⢠2402.05008 ⢠Published ⢠23
-
Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
Paper ⢠2308.16582 ⢠Published ⢠12 -
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation
Paper ⢠2310.13119 ⢠Published ⢠13 -
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Paper ⢠2310.16818 ⢠Published ⢠32 -
Text-to-3D with classifier score distillation
Paper ⢠2310.19415 ⢠Published ⢠5
-
Controllable Text Generation for Large Language Models: A Survey
Paper ⢠2408.12599 ⢠Published ⢠65 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper ⢠2408.12590 ⢠Published ⢠36 -
Real-Time Video Generation with Pyramid Attention Broadcast
Paper ⢠2408.12588 ⢠Published ⢠17 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper ⢠2408.11039 ⢠Published ⢠63
-
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models
Paper ⢠2405.16537 ⢠Published ⢠17 -
ReVideo: Remake a Video with Motion and Content Control
Paper ⢠2405.13865 ⢠Published ⢠25 -
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models
Paper ⢠2406.16863 ⢠Published ⢠11 -
Portrait Video Editing Empowered by Multimodal Generative Priors
Paper ⢠2409.13591 ⢠Published ⢠17
-
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper ⢠2402.17485 ⢠Published ⢠195 -
VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior
Paper ⢠2312.01841 ⢠Published ⢠1 -
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Paper ⢠2311.16498 ⢠Published ⢠1 -
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Paper ⢠2312.02134 ⢠Published ⢠2
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper ⢠2402.04252 ⢠Published ⢠29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper ⢠2402.03749 ⢠Published ⢠14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper ⢠2402.04615 ⢠Published ⢠44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper ⢠2402.05008 ⢠Published ⢠23
-
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
Paper ⢠2401.09416 ⢠Published ⢠11 -
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
Paper ⢠2401.10171 ⢠Published ⢠14 -
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model
Paper ⢠2311.09217 ⢠Published ⢠22 -
GALA: Generating Animatable Layered Assets from a Single Scan
Paper ⢠2401.12979 ⢠Published ⢠9
-
Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
Paper ⢠2308.16582 ⢠Published ⢠12 -
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation
Paper ⢠2310.13119 ⢠Published ⢠13 -
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Paper ⢠2310.16818 ⢠Published ⢠32 -
Text-to-3D with classifier score distillation
Paper ⢠2310.19415 ⢠Published ⢠5