AI Math: Diffusion
updated
Controllable Text Generation for Large Language Models: A Survey
Paper
• 2408.12599
• Published
• 65
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed
Representations
Paper
• 2408.12590
• Published
• 35
Real-Time Video Generation with Pyramid Attention Broadcast
Paper
• 2408.12588
• Published
• 17
Transfusion: Predict the Next Token and Diffuse Images with One
Multi-Modal Model
Paper
• 2408.11039
• Published
• 63
MegaFusion: Extend Diffusion Models towards Higher-resolution Image
Generation without Further Tuning
Paper
• 2408.11001
• Published
• 13
CODE: Confident Ordinary Differential Editing
Paper
• 2408.12418
• Published
• 4
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its
Teacher
Paper
• 2408.14176
• Published
• 62
Foundation Models for Music: A Survey
Paper
• 2408.14340
• Published
• 44
Diffusion Models Are Real-Time Game Engines
Paper
• 2408.14837
• Published
• 126
Distribution Backtracking Builds A Faster Convergence Trajectory for
One-step Diffusion Distillation
Paper
• 2408.15991
• Published
• 16
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion
Model
Paper
• 2408.16767
• Published
• 32
Discrete Diffusion Modeling by Estimating the Ratios of the Data
Distribution
Paper
• 2310.16834
• Published
• 5
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time
Series Forecasters
Paper
• 2408.17253
• Published
• 39
Paper
• 2409.00587
• Published
• 33
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world
Videos
Paper
• 2409.02095
• Published
• 37
Diffusion Policy Policy Optimization
Paper
• 2409.00588
• Published
• 20
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper
• 2409.02097
• Published
• 34
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion
Dependency
Paper
• 2409.02634
• Published
• 97
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with
Adversarial Conditional Diffusion Distillation
Paper
• 2409.02245
• Published
• 10
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free
Real Image Editing
Paper
• 2409.01322
• Published
• 96
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with
Image-Based Surface Representation
Paper
• 2409.03718
• Published
• 27
Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens
for Text-to-Any-Task
Paper
• 2409.04005
• Published
• 19
SongCreator: Lyrics-based Universal Song Generation
Paper
• 2409.06029
• Published
• 22
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
Paper
• 2409.06135
• Published
• 16
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video
Diffusion Models
Paper
• 2409.07452
• Published
• 21
Instant Facial Gaussians Translator for Relightable and Interactable
Facial Rendering
Paper
• 2409.07441
• Published
• 12
IFAdapter: Instance Feature Control for Grounded Text-to-Image
Generation
Paper
• 2409.08240
• Published
• 22
DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with
Diffusion Priors
Paper
• 2409.08278
• Published
• 15
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally
Paper
• 2409.08270
• Published
• 12
Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric
Videos
Paper
• 2409.08353
• Published
• 12
InstantDrag: Improving Interactivity in Drag-based Image Editing
Paper
• 2409.08857
• Published
• 34
A Diffusion Approach to Radiance Field Relighting using
Multi-Illumination Synthesis
Paper
• 2409.08947
• Published
• 13
DrawingSpinUp: 3D Animation from Single Character Drawings
Paper
• 2409.08615
• Published
• 19
Seed-Music: A Unified Framework for High Quality and Controlled Music
Generation
Paper
• 2409.09214
• Published
• 53
Phidias: A Generative Model for Creating 3D Content from Text, Image,
and 3D Conditions with Reference-Augmented Diffusion
Paper
• 2409.11406
• Published
• 27
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper
• 2409.11355
• Published
• 30
OSV: One Step is Enough for High-Quality Image to Video Generation
Paper
• 2409.11367
• Published
• 14
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion
Transformer
Paper
• 2409.10819
• Published
• 18
SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction
Paper
• 2409.11211
• Published
• 9
Single-Layer Learnable Activation for Implicit Neural Representation
(SL^{2}A-INR)
Paper
• 2409.10836
• Published
• 5
Implicit Neural Representations with Fourier Kolmogorov-Arnold Networks
Paper
• 2409.09323
• Published
• 5
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Paper
• 2409.09401
• Published
• 7
Vista3D: Unravel the 3D Darkside of a Single Image
Paper
• 2409.12193
• Published
• 10
LVCD: Reference-based Lineart Video Colorization with Diffusion Models
Paper
• 2409.12960
• Published
• 24
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive
Diffusion
Paper
• 2409.12957
• Published
• 21
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt
Paper
• 2409.12892
• Published
• 5
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient
Video Latent Generation
Paper
• 2409.12532
• Published
• 5
FlexiTex: Enhancing Texture Generation with Visual Guidance
Paper
• 2409.12431
• Published
• 13
MIMO: Controllable Character Video Synthesis with Spatial Decomposed
Modeling
Paper
• 2409.16160
• Published
• 34
Tabular Data Generation using Binary Diffusion
Paper
• 2409.13882
• Published
• 3
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language
Instructions
Paper
• 2409.15278
• Published
• 24
MaterialFusion: Enhancing Inverse Rendering with Material Diffusion
Priors
Paper
• 2409.15273
• Published
• 12
MaskedMimic: Unified Physics-Based Character Control Through Masked
Motion Inpainting
Paper
• 2409.14393
• Published
• 9
SpaceBlender: Creating Context-Rich Collaborative Spaces Through
Generative 3D Scene Blending
Paper
• 2409.13926
• Published
• 6
Self-Supervised Audio-Visual Soundscape Stylization
Paper
• 2409.14340
• Published
• 2
MuCodec: Ultra Low-Bitrate Music Codec
Paper
• 2409.13216
• Published
• 22
Portrait Video Editing Empowered by Multimodal Generative Priors
Paper
• 2409.13591
• Published
• 16
Colorful Diffuse Intrinsic Image Decomposition in the Wild
Paper
• 2409.13690
• Published
• 13
V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic
Gaussians
Paper
• 2409.13648
• Published
• 11
Temporally Aligned Audio for Video with Autoregression
Paper
• 2409.13689
• Published
• 9
Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror
Reflections
Paper
• 2409.14677
• Published
• 15
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense
Prediction
Paper
• 2409.18124
• Published
• 33
Pixel-Space Post-Training of Latent Diffusion Models
Paper
• 2409.17565
• Published
• 20
Disco4D: Disentangled 4D Human Generation and Animation from a Single
Image
Paper
• 2409.17280
• Published
• 10
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D
Diffusion
Paper
• 2409.17145
• Published
• 14
Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors
Paper
• 2409.17058
• Published
• 13
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
Paper
• 2409.18964
• Published
• 27
Image Copy Detection for Diffusion Models
Paper
• 2409.19952
• Published
• 13
Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image
Restoration
Paper
• 2410.00418
• Published
• 10
SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D
Semantic MPIs
Paper
• 2410.00337
• Published
• 11
DressRecon: Freeform 4D Human Reconstruction from Monocular Video
Paper
• 2409.20563
• Published
• 9
Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model
And Input View Curation
Paper
• 2410.00890
• Published
• 21
Cottention: Linear Transformers With Cosine Attention
Paper
• 2409.18747
• Published
• 16
Addition is All You Need for Energy-efficient Language Models
Paper
• 2410.00907
• Published
• 151
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
Paper
• 2410.01731
• Published
• 16
3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and
Box-Focused Sampling for 3D Object Detection
Paper
• 2410.01647
• Published
• 31
HarmoniCa: Harmonizing Training and Inference for Better Feature Cache
in Diffusion Transformer Acceleration
Paper
• 2410.01723
• Published
• 4
Eliminating Oversaturation and Artifacts of High Guidance Scales in
Diffusion Models
Paper
• 2410.02416
• Published
• 34
PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation
Paper
• 2410.01680
• Published
• 34
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
Paper
• 2410.00316
• Published
• 7
VideoGuide: Improving Video Diffusion Models without Training Through a
Teacher's Guide
Paper
• 2410.04364
• Published
• 29
Presto! Distilling Steps and Layers for Accelerating Music Generation
Paper
• 2410.05167
• Published
• 18
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal
Instruction
Paper
• 2410.04932
• Published
• 9
RoCoTex: A Robust Method for Consistent Texture Synthesis with Diffusion
Models
Paper
• 2409.19989
• Published
• 18
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep
Approach
Paper
• 2410.03160
• Published
• 5
SePPO: Semi-Policy Preference Optimization for Diffusion Alignment
Paper
• 2410.05255
• Published
• 5
IterComp: Iterative Composition-Aware Feedback Learning from Model
Gallery for Text-to-Image Generation
Paper
• 2410.07171
• Published
• 43
Pyramidal Flow Matching for Efficient Video Generative Modeling
Paper
• 2410.05954
• Published
• 40
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based
Image/Video Generation
Paper
• 2410.05591
• Published
• 13
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for
Text-to-Image Diffusion Model Unlearning
Paper
• 2410.05664
• Published
• 9
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow
Matching
Paper
• 2410.06885
• Published
• 46
Diversity-Rewarded CFG Distillation
Paper
• 2410.06084
• Published
• 10
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial
Diffusion and Masked Generative Models
Paper
• 2410.08207
• Published
• 19
Semantic Score Distillation Sampling for Compositional Text-to-3D
Generation
Paper
• 2410.09009
• Published
• 15
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image
Generation
Paper
• 2410.08159
• Published
• 26
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
Paper
• 2410.07303
• Published
• 18
Progressive Autoregressive Video Diffusion Models
Paper
• 2410.08151
• Published
• 16
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional
Diffusion Sampler
Paper
• 2410.05651
• Published
• 12
Animate-X: Universal Character Image Animation with Enhanced Motion
Representation
Paper
• 2410.10306
• Published
• 56
Cavia: Camera-controllable Multi-view Video Diffusion with
View-Integrated Attention
Paper
• 2410.10774
• Published
• 25
Semantic Image Inversion and Editing using Rectified Stochastic
Differential Equations
Paper
• 2410.10792
• Published
• 31
Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies
Paper
• 2410.10803
• Published
• 7
Efficient Diffusion Models: A Comprehensive Survey from Principles to
Practices
Paper
• 2410.11795
• Published
• 18
Constant Acceleration Flow
Paper
• 2411.00322
• Published
• 24
In-Context LoRA for Diffusion Transformers
Paper
• 2410.23775
• Published
• 11
Minimum Entropy Coupling with Bottleneck
Paper
• 2410.21666
• Published
• 5
Task Vectors are Cross-Modal
Paper
• 2410.22330
• Published
• 11
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale
Paper
• 2410.20280
• Published
• 23
Continuous Speech Synthesis using per-token Latent Diffusion
Paper
• 2410.16048
• Published
• 29
FasterCache: Training-Free Video Diffusion Model Acceleration with High
Quality
Paper
• 2410.19355
• Published
• 24
SMITE: Segment Me In TimE
Paper
• 2410.18538
• Published
• 16
Scaling Diffusion Language Models via Adaptation from Autoregressive
Models
Paper
• 2410.17891
• Published
• 16
DPLM-2: A Multimodal Diffusion Protein Language Model
Paper
• 2410.13782
• Published
• 22
Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning
via Image-Guided Diffusion
Paper
• 2410.13674
• Published
• 17
DimensionX: Create Any 3D and 4D Scenes from a Single Image with
Controllable Video Diffusion
Paper
• 2411.04928
• Published
• 56
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion
Models
Paper
• 2411.05007
• Published
• 24
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
Paper
• 2411.04989
• Published
• 14
Controlling Language and Diffusion Models by Transporting Activations
Paper
• 2410.23054
• Published
• 18
DreamPolish: Domain Score Distillation With Progressive Geometry
Generation
Paper
• 2411.01602
• Published
• 11
Constrained Diffusion Implicit Models
Paper
• 2411.00359
• Published
• 6
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Paper
• 2411.02336
• Published
• 24
Scaling Properties of Diffusion Models for Perceptual Tasks
Paper
• 2411.08034
• Published
• 13
Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model
with Compact Wavelet Encodings
Paper
• 2411.08017
• Published
• 11
Add-it: Training-Free Object Insertion in Images With Pretrained
Diffusion Models
Paper
• 2411.07232
• Published
• 68
Edify Image: High-Quality Image Generation with Pixel Space Laplacian
Diffusion Models
Paper
• 2411.07126
• Published
• 30
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D
Generation
Paper
• 2411.08033
• Published
• 25
Generative World Explorer
Paper
• 2411.11844
• Published
• 77
Stylecodes: Encoding Stylistic Information For Image Generation
Paper
• 2411.12811
• Published
• 12
Stable Flow: Vital Layers for Training-Free Image Editing
Paper
• 2411.14430
• Published
• 22
Style-Friendly SNR Sampler for Style-Driven Generation
Paper
• 2411.14793
• Published
• 39
Material Anything: Generating Materials for Any 3D Object via Diffusion
Paper
• 2411.15138
• Published
• 50
DreamRunner: Fine-Grained Storytelling Video Generation with
Retrieval-Augmented Motion Adaptation
Paper
• 2411.16657
• Published
• 19
One Diffusion to Generate Them All
Paper
• 2411.16318
• Published
• 28
OminiControl: Minimal and Universal Control for Diffusion Transformer
Paper
• 2411.15098
• Published
• 61
Novel View Extrapolation with Video Diffusion Priors
Paper
• 2411.14208
• Published
• 10
TEXGen: a Generative Diffusion Model for Mesh Textures
Paper
• 2411.14740
• Published
• 17
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
Paper
• 2411.18613
• Published
• 59
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Paper
• 2411.17440
• Published
• 37
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous
Driving
Paper
• 2411.15139
• Published
• 15
Diffusion Self-Distillation for Zero-Shot Customized Image Generation
Paper
• 2411.18616
• Published
• 16
Omegance: A Single Parameter for Various Granularities in
Diffusion-Based Synthesis
Paper
• 2411.17769
• Published
• 8
Unified Continuous Generative Models
Paper
• 2505.07447
• Published
• 42
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Paper
• 2506.13759
• Published
• 43
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with
TriMap Video Diffusion
Paper
• 2507.02813
• Published
• 60
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language
Models
Paper
• 2508.00819
• Published
• 63
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed
Inference
Paper
• 2508.02193
• Published
• 136
Diffusion Language Models Know the Answer Before Decoding
Paper
• 2508.19982
• Published
• 27