Spaces:
Running
on
Zero
A newer version of the Gradio SDK is available:
6.1.0
SV3D-diffusers
This repo provides scripts about:
Spatio-temporal UNet (
SV3DUNetSpatioTemporalConditionModel) and pipeline (StableVideo3DDiffusionPipeline) modified from SVD for SV3D in the diffusers convention.Converting the Stability-AI's SV3D-p UNet checkpoint to the diffusers convention.
Infering the
SV3D-pmodel with the diffusers library to synthesize a 21-frame orbital video around a 3D object from a single-view image (preprocessed by removing background and centering first).
Converted SV3D-p checkpoints have been uploaded to HuggingFace๐ค chenguolin/sv3d-diffusers.
๐ฅ See Also
You may also be interested in our works:
- [ICLR 2025] DiffSplat: generate 3D objects in 3DGS directly by fine-tuning a text-to-image models.
- [NeurIPS 2024] HumanSplat: SV3D is fine-tuned on human datasets for single-view human reconstruction.
๐ Usage
git clone https://github.com/chenguolin/sv3d-diffusers.git
# Please install PyTorch first according to your CUDA version
pip3 install -r requirements.txt
# If you can't access to HuggingFace๐ค, try:
# export HF_ENDPOINT=https://hf-mirror.com
python3 infer.py --output_dir out/ --image_path assets/images/sculpture.png --elevation 10 --half_precision --seed -1
The synthesized video will save at out/ as a .gif file.
๐ธ Results
Image preprocessing and random seed for different implementations are different, so the results are presented only for reference.
๐ Citation
If you find this repo helpful, please consider giving this repository a star ๐ and citing the original SV3D paper.
@inproceedings{voleti2024sv3d,
author={Voleti, Vikram and Yao, Chun-Han and Boss, Mark and Letts, Adam and Pankratz, David and Tochilkin, Dmitrii and Laforte, Christian and Rombach, Robin and Jampani, Varun},
title={{SV3D}: Novel Multi-view Synthesis and {3D} Generation from a Single Image using Latent Video Diffusion},
booktitle={European Conference on Computer Vision (ECCV)},
year={2024},
}






