nielsr HF staff commited on
Commit
293d50c
·
verified ·
1 Parent(s): 334562a

Add model card

Browse files

This PR adds a model card, linking the model to the paper, as well as adding the library name and pipeline tag.

Files changed (1) hide show
  1. README.md +134 -9
README.md CHANGED
@@ -1,3 +1,8 @@
 
 
 
 
 
1
  <p align="center">
2
 
3
  <h2 align="center">Animate-X: Universal Character Image Animation with Enhanced Motion Representation</h2>
@@ -31,30 +36,150 @@
31
  </p>
32
  </p>
33
 
34
- ## This repo include the checkpoints for Animate-X:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- - "checkpoints/dw-ll_ucoco_384.onnx": the checkpoint for dwpose extraction.
37
 
38
- - "checkpoints/open_clip_pytorch_model.bin": the checkpoint for clip embedding.
39
 
40
- - "checkpoints/animate-x_ckpt.pth": the checkpoint for X-character image animation in Animate-X (32 frames).
 
 
 
 
 
41
 
42
- - "checkpoints/yolox_l.onnx": the checkpoint for dwpose extraction.
43
 
44
- - "checkpoints/v2-1_512-ema-pruned.ckpt": the checkpoint for Stable Diffusion.
 
 
 
 
 
 
 
 
 
45
 
 
46
 
 
47
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
- ## BibTeX
50
 
51
- If this repo is useful to you, please cite our corresponding technical paper.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
- ```bibtex
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  @article{AnimateX2025,
55
  title={Animate-X: Universal Character Image Animation with Enhanced Motion Representation},
56
  author={Tan, Shuai and Gong, Biao and Wang, Xiang and Zhang, Shiwei and Zheng, Dandan and Zheng, Ruobing and Zheng, Kecheng and Chen, Jingdong and Yang, Ming},
57
  journal={arXiv preprint arXiv:2410.10306},
58
  year={2025}
59
  }
 
 
 
 
 
 
 
60
  ```
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-to-video
4
+ ---
5
+
6
  <p align="center">
7
 
8
  <h2 align="center">Animate-X: Universal Character Image Animation with Enhanced Motion Representation</h2>
 
36
  </p>
37
  </p>
38
 
39
+ This repository is the official implementation of the paper "Animate-X: Universal Character Image Animation with Enhanced Motion Representation". Animate-X is a universal animation framework based on latent diffusion models for various character types (collectively named X), including anthropomorphic characters.
40
+ <table align="center">
41
+ <tr>
42
+ <td>
43
+ <img src="https://github.com/user-attachments/assets/fb2f4396-341f-4206-8d70-44d8b034f810">
44
+ </td>
45
+ </tr>
46
+ </table>
47
+
48
+
49
+ ## &#x1F4CC; Updates
50
+ * [2024.12.20] 🔥 We release our [Animate-X](https://github.com/antgroup/animate-x) inference codes.
51
+ * [2024.12.10] 🔥 We release our [Animate-X CKPT](https://huggingface.co/Shuaishuai0219/Animate-X) checkpoints.
52
+ * [2024.10.14] 🔥 Our [paper](https://arxiv.org/abs/2410.10306) is in public on arxiv.
53
+
54
+
55
+
56
+ <!-- <video controls loop src="https://cloud.video.taobao.com/vod/vs4L24EAm6IQ5zM3SbN5AyHCSqZIXwmuobrzqNztMRM.mp4" muted="false"></video> -->
57
+
58
+ ## &#x1F304; Gallery
59
+ ### Introduction
60
+ <table class="center">
61
+ <tr>
62
+ <td width=47% style="border: none">
63
+ <video controls loop src="https://github.com/user-attachments/assets/085b70c4-cb68-4ac1-b45f-ed7f1c75bd5c" muted="false"></video>
64
+ </td>
65
+ <td width=53% style="border: none">
66
+ <video controls loop src="https://github.com/user-attachments/assets/f6275c0d-fbca-43b4-b6d6-cf095723729e" muted="false"></video>
67
+ </td>
68
+ </tr>
69
+ </table>
70
+
71
+ ### Animations produced by Animate-X
72
+ <table class="center">
73
+ <tr>
74
+ <td width=50% style="border: none">
75
+ <video controls loop src="https://github.com/user-attachments/assets/732a3445-2054-4e7b-9c2d-9db21c39771e" muted="false"></video>
76
+ </td>
77
+ <td width=50% style="border: none">
78
+ <video controls loop src="https://github.com/user-attachments/assets/f25af02c-e5be-4cab-ae64-c9e0b392643a" muted="false"></video>
79
+ </td>
80
+ </tr>
81
+ </table>
82
+
83
 
 
84
 
 
85
 
86
+ ## &#x1F680; Installation
87
+ Install with `conda`:
88
+ ```bash
89
+ conda env create -f environment.yaml
90
+ conda activate animate-x
91
+ ```
92
 
 
93
 
94
+ ## &#x1F680; Download Checkpoints
95
+ Download Animate-X [checkpoints](https://huggingface.co/Shuaishuai0219/Animate-X) and put all files in `checkpoints` dir, which should be like:
96
+ ```
97
+ ./checkpoints/
98
+ |---- animate-x.pth
99
+ |---- dw-ll_ucoco_384.onnx
100
+ |---- open_clip_pytorch_model.bin
101
+ |---- v2-1_512-ema-pruned.ckpt
102
+ └---- yolox_l.onnx
103
+ ```
104
 
105
+ ## &#x1F4A1; Inference
106
 
107
+ The default inputs are a image (.jpg) and a dance video (.mp4). The default output is a 32-frame video (.mp4) with 768x512 resolution, which will be saved in `./results` dir.
108
 
109
+ 1. pre-process the video.
110
+ ```bash
111
+ python process_data.py \
112
+ --source_video_paths data/videos \
113
+ --saved_pose_dir data/saved_pkl \
114
+ --saved_pose data/saved_pose \
115
+ --saved_frame_dir data/saved_frames
116
+ ```
117
+ 2. run Animate-X.
118
+ ```bash
119
+ python inference.py --cfg configs/Animate_X_infer.yaml
120
+ ```
121
 
 
122
 
123
+ Some key parameters in the `.yaml` configuration file are described as follows. For example, users can adjust the `max_frames` or `sampling interval` of the dance video to generate videos of varying durations or speeds.
124
+ - `max_frames`: Number of frames (default as 32) in the generated video (fps: 8).
125
+ - If you want to generage longer video with more frames, you should modify
126
+ - `max_frames` as the number of frames
127
+ - `seq_len` in `UNet` as the number of frames + 1
128
+ - We take 96 frames as an example, and the config should be:
129
+ ```python
130
+ {
131
+ max_frames: 96 # 1. modify `max_frames` as the number of frames (e.g. 96)
132
+ ......
133
+ UNet: {
134
+ ......
135
+ 'use_sim_mask': False,
136
+ 'seq_len': 97, # 2. modify `seq_len` in `UNet` as the number of frames + 1 (e.g. 97 = 96 + 1)
137
+ }
138
+ }
139
+ ```
140
+ - `round`: The number of times each test case is generated.
141
+ - `test_list_path`: The input paths for all test cases, for example:
142
+ ```python
143
+ [
144
+ [2, "data/images/1.jpg", "data/saved_pose/dance_1","data/saved_frames/dance_1","data/saved_pkl/dance_1.pkl", 14],
145
+ [2, "data/images/4.png", "data/saved_pose/dance_1","data/saved_frames/dance_1","data/saved_pkl/dance_1.pkl", 14],
146
+ ......
147
+ ]
148
+ ```
149
+ - `2` indicates that 1 frame is sampled from every 2 frames of the reference dance video to be used as input for the model.
150
+ - `"data/images/1.jpg"` indicates the path to the reference image.
151
+ - `"data/saved_pose/dance_1"` indicates the path to the saved pose images. (output by `process_data.py`, $I^p$, keypoints visualization)
152
+ - `"data/saved_frames/dance_1"` indicates the path to the saved frames from the driven video. (output by `process_data.py`)
153
+ - `"data/saved_pkl/dance_1.pkl"` indicates the path to the saved pose keypoints. (output by `process_data.py`, $p^d$, DWPose)
154
+ - `14` indicates the random seed.
155
+ - `log_dir`: path to the generated animation videos, e.g., `./results`.
156
 
157
+
158
+ **&#10004; Some tips**:
159
+
160
+ > Although Animate-x does not rely on strict pose alignment and we did not perform any manual alignment operations for all the results in the paper, we cannot guarantee that all cases are perfect. Therefore, users can perform handmade pose alignment operations themselves, e.g, applying the overall x/y translation and scaling on the pose skeleton of each frame to align with the position of the subject in the reference image. (put in `data/saved_pose`)
161
+
162
+
163
+ ## &#x1F4E7; Acknowledgement
164
+ Our implementation is based on [UniAnimate](https://github.com/ali-vilab/UniAnimate), [MimicMotion](https://github.com/Tencent/MimicMotion), and [MusePose](https://github.com/TMElyralab/MusePose). Thanks for their remarkable contribution and released code! If we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.
165
+
166
+ ## &#x2696; License
167
+ This repository is released under the Apache-2.0 license as found in the [LICENSE](LICENSE) file.
168
+
169
+ ## &#x1F4DA; Citation
170
+ If you find this codebase useful for your research, please use the following entry.
171
+ ```BibTeX
172
  @article{AnimateX2025,
173
  title={Animate-X: Universal Character Image Animation with Enhanced Motion Representation},
174
  author={Tan, Shuai and Gong, Biao and Wang, Xiang and Zhang, Shiwei and Zheng, Dandan and Zheng, Ruobing and Zheng, Kecheng and Chen, Jingdong and Yang, Ming},
175
  journal={arXiv preprint arXiv:2410.10306},
176
  year={2025}
177
  }
178
+
179
+ @article{Mimir2025,
180
+ title={Mimir: Improving Video Diffusion Models for Precise Text Understanding},
181
+ author={Tan, Shuai and Gong, Biao and Feng, Yutong and Zheng, Kecheng and Zheng, Dandan and Shi, Shuwei and Shen, Yujun and Chen, Jingdong and Yang, Ming},
182
+ journal={arXiv preprint arXiv:2412.03085},
183
+ year={2025}
184
+ }
185
  ```