metadata
title: MIMO - Controllable Character Video Synthesis
emoji: π¬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app_hf_spaces.py
pinned: false
license: apache-2.0
hardware: t4-medium
MIMO - Complete Character Video Synthesis
π¬ Full Implementation Matching Research Paper
Transform character images into animated videos with controllable motion and advanced video editing capabilities.
Features
π Character Animation Mode
- Based on:
run_animate.pyfrom original repository - Function: Animate static character images with motion templates
- Use cases: Create character animations, bring photos to life
- Quality: Optimized for HuggingFace GPU (512x512, 20 steps)
π¬ Video Character Editing Mode
- Based on:
run_edit.pyfrom original repository - Function: Advanced video editing with background preservation
- Features: Human segmentation, occlusion handling, seamless blending
- Quality: Higher resolution (784x784, 25 steps) for professional results
Available Motion Templates
Sports Templates
sports_basketball_gym- Basketball court actionssports_nba_dunk- Professional basketball dunkingsports_nba_pass- Basketball passing motionssyn_football_10_05- Football/soccer movements
Action Templates
shorts_kungfu_desert1- Martial arts in desert settingshorts_kungfu_match1- Fighting sequencesparkour_climbing- Parkour and climbing actionsmovie_BruceLee1- Classic martial arts moves
Dance Templates
dance_indoor_1- Indoor dance choreographysyn_dancing2_00093_irish_dance- Irish dance movements
Synthetic Templates
syn_basketball_06_13- Synthetic basketball motionssyn_dancing2_00093_irish_dance- Synthetic dance sequences
Technical Specifications
Model Architecture
- Base Model: Stable Diffusion v1.5 with temporal modules
- Components: 3D UNet, Pose Guider, CLIP Image Encoder
- Human Segmentation: TensorFlow-based matting model
- Scheduler: DDIM with v-prediction parameterization
Performance Optimizations
- Auto GPU Detection: T4/A10G/A100 support with FP16/FP32
- Memory Management: Efficient model loading and caching
- Progressive Download: Models downloaded on first use
- Quality vs Speed: Balanced settings for web deployment
Technical Details
- Input Resolution: Any size (auto-processed to optimal dimensions)
- Output Resolution: 512x512 (Animation), 784x784 (Editing)
- Frame Count: Up to 150 frames (memory limited)
- Processing Time: 2-5 minutes depending on template length
Usage Instructions
- Setup Models (one-time, ~8GB download)
- Upload Character Image (clear, front-facing works best)
- Select Generation Mode:
- Animation: Faster, simpler character animation
- Editing: Advanced with background blending
- Choose Motion Template from available options
- Generate Video and wait for processing
Model Credits
- Original Paper: MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling
- Authors: Yifang Men, Yuan Yao, Miaomiao Cui, Liefeng Bo (Alibaba Group)
- Conference: CVPR 2025
- Code: GitHub Repository
Acknowledgments
Built upon:
β οΈ Note: This is a complete implementation of the MIMO research paper, providing both simple animation and advanced video editing capabilities as described in the original work.