---
title: MIMO - Controllable Character Video Synthesis
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app_hf_spaces.py
pinned: false
license: apache-2.0
hardware: t4-medium
---

# MIMO - Complete Character Video Synthesis

**🎬 Full Implementation Matching Research Paper**

Transform character images into animated videos with controllable motion and advanced video editing capabilities.

## Features

### 🎭 Character Animation Mode
- **Based on:** `run_animate.py` from original repository
- **Function:** Animate static character images with motion templates
- **Use cases:** Create character animations, bring photos to life
- **Quality:** Optimized for HuggingFace GPU (512x512, 20 steps)

### 🎬 Video Character Editing Mode
- **Based on:** `run_edit.py` from original repository
- **Function:** Advanced video editing with background preservation
- **Features:** Human segmentation, occlusion handling, seamless blending
- **Quality:** Higher resolution (784x784, 25 steps) for professional results

## Available Motion Templates

### Sports Templates
- `sports_basketball_gym` - Basketball court actions
- `sports_nba_dunk` - Professional basketball dunking
- `sports_nba_pass` - Basketball passing motions
- `syn_football_10_05` - Football/soccer movements

### Action Templates
- `shorts_kungfu_desert1` - Martial arts in desert setting
- `shorts_kungfu_match1` - Fighting sequences
- `parkour_climbing` - Parkour and climbing actions
- `movie_BruceLee1` - Classic martial arts moves

### Dance Templates
- `dance_indoor_1` - Indoor dance choreography
- `syn_dancing2_00093_irish_dance` - Irish dance movements

### Synthetic Templates
- `syn_basketball_06_13` - Synthetic basketball motions
- `syn_dancing2_00093_irish_dance` - Synthetic dance sequences

## Technical Specifications

### Model Architecture
- **Base Model:** Stable Diffusion v1.5 with temporal modules
- **Components:** 3D UNet, Pose Guider, CLIP Image Encoder
- **Human Segmentation:** TensorFlow-based matting model
- **Scheduler:** DDIM with v-prediction parameterization

### Performance Optimizations
- **Auto GPU Detection:** T4/A10G/A100 support with FP16/FP32
- **Memory Management:** Efficient model loading and caching
- **Progressive Download:** Models downloaded on first use
- **Quality vs Speed:** Balanced settings for web deployment

### Technical Details
- **Input Resolution:** Any size (auto-processed to optimal dimensions)
- **Output Resolution:** 512x512 (Animation), 784x784 (Editing)
- **Frame Count:** Up to 150 frames (memory limited)
- **Processing Time:** 2-5 minutes depending on template length

## Usage Instructions

1. **Setup Models** (one-time, ~8GB download)
2. **Upload Character Image** (clear, front-facing works best)
3. **Select Generation Mode:**
   - Animation: Faster, simpler character animation
   - Editing: Advanced with background blending
4. **Choose Motion Template** from available options
5. **Generate Video** and wait for processing

## Model Credits

- **Original Paper:** [MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling](https://arxiv.org/abs/2409.16160)
- **Authors:** Yifang Men, Yuan Yao, Miaomiao Cui, Liefeng Bo (Alibaba Group)
- **Conference:** CVPR 2025
- **Code:** [GitHub Repository](https://github.com/menyifang/MIMO)

## Acknowledgments

Built upon:
- [Stable Diffusion](https://huggingface.co/runwayml/stable-diffusion-v1-5)
- [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone)
- [SAM](https://github.com/facebookresearch/segment-anything)
- [4D-Humans](https://github.com/shubham-goel/4D-Humans)
- [ProPainter](https://github.com/sczhou/ProPainter)

---

**⚠️ Note:** This is a complete implementation of the MIMO research paper, providing both simple animation and advanced video editing capabilities as described in the original work.