--- title: MIMO - Controllable Character Video Synthesis emoji: 🎬 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.0.0 app_file: app_hf_spaces.py pinned: false license: apache-2.0 hardware: t4-medium --- # MIMO - Complete Character Video Synthesis **🎬 Full Implementation Matching Research Paper** Transform character images into animated videos with controllable motion and advanced video editing capabilities. ## Features ### 🎭 Character Animation Mode - **Based on:** `run_animate.py` from original repository - **Function:** Animate static character images with motion templates - **Use cases:** Create character animations, bring photos to life - **Quality:** Optimized for HuggingFace GPU (512x512, 20 steps) ### 🎬 Video Character Editing Mode - **Based on:** `run_edit.py` from original repository - **Function:** Advanced video editing with background preservation - **Features:** Human segmentation, occlusion handling, seamless blending - **Quality:** Higher resolution (784x784, 25 steps) for professional results ## Available Motion Templates ### Sports Templates - `sports_basketball_gym` - Basketball court actions - `sports_nba_dunk` - Professional basketball dunking - `sports_nba_pass` - Basketball passing motions - `syn_football_10_05` - Football/soccer movements ### Action Templates - `shorts_kungfu_desert1` - Martial arts in desert setting - `shorts_kungfu_match1` - Fighting sequences - `parkour_climbing` - Parkour and climbing actions - `movie_BruceLee1` - Classic martial arts moves ### Dance Templates - `dance_indoor_1` - Indoor dance choreography - `syn_dancing2_00093_irish_dance` - Irish dance movements ### Synthetic Templates - `syn_basketball_06_13` - Synthetic basketball motions - `syn_dancing2_00093_irish_dance` - Synthetic dance sequences ## Technical Specifications ### Model Architecture - **Base Model:** Stable Diffusion v1.5 with temporal modules - **Components:** 3D UNet, Pose Guider, CLIP Image Encoder - **Human Segmentation:** TensorFlow-based matting model - **Scheduler:** DDIM with v-prediction parameterization ### Performance Optimizations - **Auto GPU Detection:** T4/A10G/A100 support with FP16/FP32 - **Memory Management:** Efficient model loading and caching - **Progressive Download:** Models downloaded on first use - **Quality vs Speed:** Balanced settings for web deployment ### Technical Details - **Input Resolution:** Any size (auto-processed to optimal dimensions) - **Output Resolution:** 512x512 (Animation), 784x784 (Editing) - **Frame Count:** Up to 150 frames (memory limited) - **Processing Time:** 2-5 minutes depending on template length ## Usage Instructions 1. **Setup Models** (one-time, ~8GB download) 2. **Upload Character Image** (clear, front-facing works best) 3. **Select Generation Mode:** - Animation: Faster, simpler character animation - Editing: Advanced with background blending 4. **Choose Motion Template** from available options 5. **Generate Video** and wait for processing ## Model Credits - **Original Paper:** [MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling](https://arxiv.org/abs/2409.16160) - **Authors:** Yifang Men, Yuan Yao, Miaomiao Cui, Liefeng Bo (Alibaba Group) - **Conference:** CVPR 2025 - **Code:** [GitHub Repository](https://github.com/menyifang/MIMO) ## Acknowledgments Built upon: - [Stable Diffusion](https://huggingface.co/runwayml/stable-diffusion-v1-5) - [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone) - [SAM](https://github.com/facebookresearch/segment-anything) - [4D-Humans](https://github.com/shubham-goel/4D-Humans) - [ProPainter](https://github.com/sczhou/ProPainter) --- **⚠️ Note:** This is a complete implementation of the MIMO research paper, providing both simple animation and advanced video editing capabilities as described in the original work.