--- title: MIMO - Character Video Synthesis emoji: 🎭 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.7.1 app_file: app.py pinned: false license: apache-2.0 python_version: "3.10" --- # MIMO - Controllable Character Video Synthesis **🎬 Complete Implementation - Optimized for HuggingFace Spaces** Transform character images into animated videos with controllable motion and advanced video editing capabilities. ## 🚀 Quick Start 1. **Setup Models**: Click "Setup Models" button (downloads required models) 2. **Load Model**: Click "Load Model" button (initializes MIMO pipeline) 3. **Upload Image**: Character image (person, anime, cartoon, etc.) 4. **Choose Template** (Optional): Select motion template or use reference image only 5. **Generate**: Create animated video > **Note on Templates**: Video templates are optional. See [TEMPLATES_SETUP.md](TEMPLATES_SETUP.md) for adding custom templates. ## ⚡ Why This Approach? To prevent HuggingFace Spaces build timeout, we use **progressive loading**: - **Minimal dependencies** at startup (fast build) - **Runtime installation** of heavy packages (TensorFlow, OpenCV) - **Full features** available after one-time setup ## Features ### 🎭 Character Animation Mode - Simple character animation with motion templates - Based on `run_animate.py` from original repository - Fast generation (512x512, 20 steps) ### 🎬 Video Character Editing Mode - Advanced editing with background preservation - Human segmentation and occlusion handling - Based on `run_edit.py` from original repository - High quality (784x784, 25 steps) ## Available Templates **Sports:** basketball_gym, nba_dunk, nba_pass, football **Action:** kungfu_desert, kungfu_match, parkour, BruceLee **Dance:** dance_indoor, irish_dance **Synthetic:** syn_basketball, syn_dancing ## Technical Details - **Models:** Stable Diffusion v1.5 + 3D UNet + Pose Guider - **GPU:** Auto-detection (T4/A10G/A100) with FP16/FP32 - **Resolution:** 512x512 (Animation), 784x784 (Editing) - **Processing:** 2-5 minutes depending on template - **Video I/O:** PyAV (`av` pip package) for frame decoding/encoding ## Credits **Paper:** [MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling](https://arxiv.org/abs/2409.16160) **Authors:** Yifang Men, Yuan Yao, Miaomiao Cui, Liefeng Bo (Alibaba Group) **Conference:** CVPR 2025 **Code:** [GitHub](https://github.com/menyifang/MIMO)