Spaces:

minhho
/

mimo-1.0

Paused

App Files Files Community

mimo-1.0 / README_HF_SPACES.md

minhho

Clean deployment: All fixes without binary files

6f2c7f0 3 months ago

preview code

raw

history blame

3.89 kB

metadata

title: MIMO - Controllable Character Video Synthesis
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app_hf_spaces.py
pinned: false
license: apache-2.0
hardware: t4-medium

MIMO - Complete Character Video Synthesis

🎬 Full Implementation Matching Research Paper

Transform character images into animated videos with controllable motion and advanced video editing capabilities.

Features

🎭 Character Animation Mode

Based on: run_animate.py from original repository
Function: Animate static character images with motion templates
Use cases: Create character animations, bring photos to life
Quality: Optimized for HuggingFace GPU (512x512, 20 steps)

🎬 Video Character Editing Mode

Based on: run_edit.py from original repository
Function: Advanced video editing with background preservation
Features: Human segmentation, occlusion handling, seamless blending
Quality: Higher resolution (784x784, 25 steps) for professional results

Available Motion Templates

Sports Templates

sports_basketball_gym - Basketball court actions
sports_nba_dunk - Professional basketball dunking
sports_nba_pass - Basketball passing motions
syn_football_10_05 - Football/soccer movements

Action Templates

shorts_kungfu_desert1 - Martial arts in desert setting
shorts_kungfu_match1 - Fighting sequences
parkour_climbing - Parkour and climbing actions
movie_BruceLee1 - Classic martial arts moves

Dance Templates

dance_indoor_1 - Indoor dance choreography
syn_dancing2_00093_irish_dance - Irish dance movements

Synthetic Templates

syn_basketball_06_13 - Synthetic basketball motions
syn_dancing2_00093_irish_dance - Synthetic dance sequences

Technical Specifications

Model Architecture

Base Model: Stable Diffusion v1.5 with temporal modules
Components: 3D UNet, Pose Guider, CLIP Image Encoder
Human Segmentation: TensorFlow-based matting model
Scheduler: DDIM with v-prediction parameterization

Performance Optimizations

Auto GPU Detection: T4/A10G/A100 support with FP16/FP32
Memory Management: Efficient model loading and caching
Progressive Download: Models downloaded on first use
Quality vs Speed: Balanced settings for web deployment

Technical Details

Input Resolution: Any size (auto-processed to optimal dimensions)
Output Resolution: 512x512 (Animation), 784x784 (Editing)
Frame Count: Up to 150 frames (memory limited)
Processing Time: 2-5 minutes depending on template length

Usage Instructions

Setup Models (one-time, ~8GB download)
Upload Character Image (clear, front-facing works best)
Select Generation Mode:
- Animation: Faster, simpler character animation
- Editing: Advanced with background blending
Choose Motion Template from available options
Generate Video and wait for processing

Model Credits

Original Paper: MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling
Authors: Yifang Men, Yuan Yao, Miaomiao Cui, Liefeng Bo (Alibaba Group)
Conference: CVPR 2025
Code: GitHub Repository

Acknowledgments

Built upon:

⚠️ Note: This is a complete implementation of the MIMO research paper, providing both simple animation and advanced video editing capabilities as described in the original work.