fffiloni commited on
Commit
6e607c7
β€’
1 Parent(s): 914d005

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -2
README.md CHANGED
@@ -1,12 +1,106 @@
1
  ---
2
  title: MimicMotion
3
- emoji: 😻
4
  colorFrom: blue
5
  colorTo: purple
6
  sdk: gradio
7
  sdk_version: 4.39.0
8
  app_file: app.py
9
  pinned: false
 
10
  ---
 
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: MimicMotion
3
+ emoji: πŸ€Έβ€β™€οΈ
4
  colorFrom: blue
5
  colorTo: purple
6
  sdk: gradio
7
  sdk_version: 4.39.0
8
  app_file: app.py
9
  pinned: false
10
+ suggested_hardware: a10g-large
11
  ---
12
+ # MimicMotion
13
 
14
+ <a href='http://tencent.github.io/MimicMotion'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2406.19680'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> [![Replicate](https://replicate.com/zsxkib/mimic-motion/badge)](https://replicate.com/zsxkib/mimic-motion)
15
+
16
+ MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
17
+ <br/>
18
+ *Yuang Zhang<sup>1,2</sup>, Jiaxi Gu<sup>1</sup>, Li-Wen Wang<sup>1</sup>, Han Wang<sup>1,2</sup>, Junqi Cheng<sup>1</sup>, Yuefeng Zhu<sup>1</sup>, Fangyuan Zou<sup>1</sup>*
19
+ <br/>
20
+ [<sup>1</sup>Tencent; <sup>2</sup>Shanghai Jiao Tong University]
21
+
22
+ <p align="center">
23
+ <img src="assets/figures/preview_1.gif" width="100" />
24
+ <img src="assets/figures/preview_2.gif" width="100" />
25
+ <img src="assets/figures/preview_3.gif" width="100" />
26
+ <img src="assets/figures/preview_4.gif" width="100" />
27
+ <img src="assets/figures/preview_5.gif" width="100" />
28
+ <img src="assets/figures/preview_6.gif" width="100" />
29
+ <br/>
30
+ <span>Highlights: <b>rich details</b>, <b> good temporal smoothness</b>, and <b>long video length</b>. </span>
31
+ </p>
32
+
33
+ ## Overview
34
+
35
+ <p align="center">
36
+ <img src="assets/figures/model_structure.png" alt="model architecture" width="640"/>
37
+ </br>
38
+ <i>An overview of the framework of MimicMotion.</i>
39
+ </p>
40
+
41
+ In recent years, generative artificial intelligence has achieved significant advancements in the field of image generation, spawning a variety of applications. However, video generation still faces considerable challenges in various aspects such as controllability, video length, and richness of details, which hinder the application and popularization of this technology. In this work, we propose a controllable video generation framework, dubbed *MimicMotion*, which can generate high-quality videos of arbitrary length with any motion guidance. Comparing with previous methods, our approach has several highlights. Firstly, with confidence-aware pose guidance, temporal smoothness can be achieved so model robustness can be enhanced with large-scale training data. Secondly, regional loss amplification based on pose confidence significantly eases the distortion of image significantly. Lastly, for generating long smooth videos, a progressive latent fusion strategy is proposed. By this means, videos of arbitrary length can be generated with acceptable resource consumption. With extensive experiments and user studies, MimicMotion demonstrates significant improvements over previous approaches in multiple aspects.
42
+
43
+ ## Quickstart
44
+
45
+ ### Environment setup
46
+
47
+ Recommend python 3+ with torch 2.x are validated with an Nvidia V100 GPU. Follow the command below to install all the dependencies of python:
48
+
49
+ ```
50
+ conda env create -f environment.yaml
51
+ conda activate mimicmotion
52
+ ```
53
+
54
+ ### Download weights
55
+ Please download weights manually as follows:
56
+ ```
57
+ cd MimicMotions/
58
+ mkdir models
59
+ ```
60
+ 1. Download SVD model: [stabilityai/stable-video-diffusion-img2vid-xt-1-1](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1)
61
+ ```
62
+ git lfs install
63
+ git clone https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1
64
+ mkdir -p models/SVD
65
+ mv stable-video-diffusion-img2vid-xt-1-1 models/SVD/
66
+ ```
67
+ 2. Download DWPose pretrained model: [dwpose](https://huggingface.co/yzd-v/DWPose/tree/main)
68
+ ```
69
+ git lfs install
70
+ git clone https://huggingface.co/yzd-v/DWPose
71
+ mv DWPose models/
72
+ ```
73
+ 3. Download the pre-trained checkpoint of MimicMotion from [Huggingface](https://huggingface.co/ixaac/MimicMotion)
74
+ ```
75
+ curl -o models/MimicMotion.pth https://huggingface.co/ixaac/MimicMotion/resolve/main/MimicMotion.pth
76
+ ```
77
+
78
+ Finally, all the weights should be organized in models as follows
79
+
80
+ ```
81
+ models/
82
+ β”œβ”€β”€ DWPose
83
+ β”‚ β”œβ”€β”€ dw-ll_ucoco_384.onnx
84
+ β”‚ └── yolox_l.onnx
85
+ β”œβ”€β”€ SVD
86
+ β”‚ └──stable-video-diffusion-img2vid-xt-1-1
87
+ └── MimicMotion.pth
88
+ ```
89
+
90
+ ### Model inference
91
+
92
+ We provide the inference script.
93
+ ```
94
+ python inference.py --inference_config configs/test.yaml
95
+ ```
96
+
97
+
98
+ ## Citation
99
+ ```bib
100
+ @article{mimicmotion2024,
101
+ title={MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance},
102
+ author={Yuang Zhang and Jiaxi Gu and Li-Wen Wang and Han Wang and Junqi Cheng and Yuefeng Zhu and Fangyuan Zou},
103
+ journal={arXiv preprint arXiv:2406.19680},
104
+ year={2024}
105
+ }
106
+ ```