mimo-1.0 / README_HF_SPACES.md
minhho's picture
Clean deployment: All fixes without binary files
6f2c7f0
|
raw
history blame
3.89 kB
metadata
title: MIMO - Controllable Character Video Synthesis
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app_hf_spaces.py
pinned: false
license: apache-2.0
hardware: t4-medium

MIMO - Complete Character Video Synthesis

🎬 Full Implementation Matching Research Paper

Transform character images into animated videos with controllable motion and advanced video editing capabilities.

Features

🎭 Character Animation Mode

  • Based on: run_animate.py from original repository
  • Function: Animate static character images with motion templates
  • Use cases: Create character animations, bring photos to life
  • Quality: Optimized for HuggingFace GPU (512x512, 20 steps)

🎬 Video Character Editing Mode

  • Based on: run_edit.py from original repository
  • Function: Advanced video editing with background preservation
  • Features: Human segmentation, occlusion handling, seamless blending
  • Quality: Higher resolution (784x784, 25 steps) for professional results

Available Motion Templates

Sports Templates

  • sports_basketball_gym - Basketball court actions
  • sports_nba_dunk - Professional basketball dunking
  • sports_nba_pass - Basketball passing motions
  • syn_football_10_05 - Football/soccer movements

Action Templates

  • shorts_kungfu_desert1 - Martial arts in desert setting
  • shorts_kungfu_match1 - Fighting sequences
  • parkour_climbing - Parkour and climbing actions
  • movie_BruceLee1 - Classic martial arts moves

Dance Templates

  • dance_indoor_1 - Indoor dance choreography
  • syn_dancing2_00093_irish_dance - Irish dance movements

Synthetic Templates

  • syn_basketball_06_13 - Synthetic basketball motions
  • syn_dancing2_00093_irish_dance - Synthetic dance sequences

Technical Specifications

Model Architecture

  • Base Model: Stable Diffusion v1.5 with temporal modules
  • Components: 3D UNet, Pose Guider, CLIP Image Encoder
  • Human Segmentation: TensorFlow-based matting model
  • Scheduler: DDIM with v-prediction parameterization

Performance Optimizations

  • Auto GPU Detection: T4/A10G/A100 support with FP16/FP32
  • Memory Management: Efficient model loading and caching
  • Progressive Download: Models downloaded on first use
  • Quality vs Speed: Balanced settings for web deployment

Technical Details

  • Input Resolution: Any size (auto-processed to optimal dimensions)
  • Output Resolution: 512x512 (Animation), 784x784 (Editing)
  • Frame Count: Up to 150 frames (memory limited)
  • Processing Time: 2-5 minutes depending on template length

Usage Instructions

  1. Setup Models (one-time, ~8GB download)
  2. Upload Character Image (clear, front-facing works best)
  3. Select Generation Mode:
    • Animation: Faster, simpler character animation
    • Editing: Advanced with background blending
  4. Choose Motion Template from available options
  5. Generate Video and wait for processing

Model Credits

Acknowledgments

Built upon:


⚠️ Note: This is a complete implementation of the MIMO research paper, providing both simple animation and advanced video editing capabilities as described in the original work.