noaltian commited on
Commit
ec975f9
Β·
verified Β·
1 Parent(s): 26ff5b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -29
README.md CHANGED
@@ -5,12 +5,16 @@ license_link: LICENSE
5
  ---
6
  <!-- ## **HunyuanVideo** -->
7
 
 
 
8
  <p align="center">
9
  <img src="https://raw.githubusercontent.com/Tencent/HunyuanVideo-I2V/refs/heads/main/assets/logo.png" height=100>
10
  </p>
11
 
12
  # **HunyuanVideo-I2V** πŸŒ…
13
 
 
 
14
  Following the great successful open-sourcing of our [HunyuanVideo](https://github.com/Tencent/HunyuanVideo), we proudly present the [HunyuanVideo-I2V](https://github.com/Tencent/HunyuanVideo-I2V), a new image-to-video generation framework to accelerate open-source community exploration!
15
 
16
  This repo contains offical PyTorch model definitions, pre-trained weights and inference/sampling code. You can find more visualizations on our [project page](https://aivideo.hunyuan.tencent.com). Meanwhile, we have released the LoRA training code for customizable special effects, which can be used to create more interesting video effects.
@@ -20,15 +24,48 @@ This repo contains offical PyTorch model definitions, pre-trained weights and in
20
 
21
 
22
  ## πŸ”₯πŸ”₯πŸ”₯ News!!
 
23
  * Mar 06, 2025: πŸ‘‹ We release the inference code and model weights of HunyuanVideo-I2V. [Download](https://github.com/Tencent/HunyuanVideo-I2V/blob/main/ckpts/README.md).
24
 
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ## πŸ“‘ Open-source Plan
27
  - HunyuanVideo-I2V (Image-to-Video Model)
28
- - [x] Lora training scripts
29
  - [x] Inference
30
  - [x] Checkpoints
31
  - [x] ComfyUI
 
32
  - [ ] Multi-gpus Sequence Parallel inference (Faster inference speed on more gpus)
33
  - [ ] Diffusers
34
  - [ ] FP8 Quantified weight
@@ -44,20 +81,15 @@ This repo contains offical PyTorch model definitions, pre-trained weights and in
44
  - [Installation Guide for Linux](#installation-guide-for-linux)
45
  - [🧱 Download Pretrained Models](#-download-pretrained-models)
46
  - [πŸ”‘ Single-gpu Inference](#-single-gpu-inference)
 
47
  - [Using Command Line](#using-command-line)
48
  - [More Configurations](#more-configurations)
49
- - [πŸŽ‰ Customizable I2V LoRA effects training](#-customizable-i2v-lora-effects-training)
50
- - [Requirements](#requirements)
51
- - [Environment](#environment)
52
- - [Training data construction](#training-data-construction)
53
- - [Training](#training)
54
- - [Inference](#inference)
55
  - [πŸ”— BibTeX](#-bibtex)
56
  - [Acknowledgements](#acknowledgements)
57
  ---
58
 
59
  ## **HunyuanVideo-I2V Overall Architecture**
60
- Leveraging the advanced video generation capabilities of [HunyuanVideo](https://github.com/Tencent/HunyuanVideo), we have extended its application to image-to-video generation tasks. To achieve this, we employ an image latent concatenation technique to effectively reconstruct and incorporate reference image information into the video generation process.
61
 
62
  Since we utilizes a pre-trained Multimodal Large Language Model (MLLM) with a Decoder-Only architecture as the text encoder, we can significantly enhance the model's ability to comprehend the semantic content of the input image and to seamlessly integrate information from both the image and its associated caption. Specifically, the input image is processed by the MLLM to generate semantic image tokens. These tokens are then concatenated with the video latent tokens, enabling comprehensive full-attention computation across the combined data.
63
 
@@ -135,7 +167,7 @@ docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyua
135
 
136
  ## 🧱 Download Pretrained Models
137
 
138
- The details of download pretrained models are shown [here](https://github.com/Tencent/HunyuanVideo-I2V/blob/main/ckpts/README.md).
139
 
140
 
141
 
@@ -143,6 +175,17 @@ The details of download pretrained models are shown [here](https://github.com/Te
143
 
144
  Similar to [HunyuanVideo](https://github.com/Tencent/HunyuanVideo), HunyuanVideo-I2V supports high-resolution video generation, with resolution up to 720P and video length up to 129 frames (5 seconds).
145
 
 
 
 
 
 
 
 
 
 
 
 
146
  ### Using Command Line
147
 
148
  <!-- ### Run a Gradio Server
@@ -152,44 +195,68 @@ python3 gradio_server.py --flow-reverse
152
  # set SERVER_NAME and SERVER_PORT manually
153
  # SERVER_NAME=0.0.0.0 SERVER_PORT=8081 python3 gradio_server.py --flow-reverse
154
  ``` -->
 
155
  ```bash
156
  cd HunyuanVideo-I2V
157
 
158
  python3 sample_image2video.py \
 
 
159
  --model HYVideo-T/2 \
160
- --prompt "A man with short gray hair plays a red electric guitar." \
161
  --i2v-mode \
162
- --i2v-image-path ./assets/demo/i2v/imgs/0.png \
163
  --i2v-resolution 720p \
 
 
164
  --video-length 129 \
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
165
  --infer-steps 50 \
 
166
  --flow-reverse \
167
  --flow-shift 17.0 \
168
  --seed 0 \
 
169
  --use-cpu-offload \
170
- --save-path ./results
171
  ```
172
  ### More Configurations
173
 
174
  We list some more useful configurations for easy usage:
175
 
176
- | Argument | Default | Description |
177
- |:----------------------:|:-----------------------------:|:------------------------------------------------------------:|
178
- | `--prompt` | None | The text prompt for video generation. |
179
- | `--model` | HYVideo-T/2-cfgdistill | Here we use HYVideo-T/2 for I2V, HYVideo-T/2-cfgdistill is used for T2V mode. |
180
- | `--i2v-mode` | False | Whether to open i2v mode. |
181
- | `--i2v-image-path` | ./assets/demo/i2v/imgs/0.png | The reference image for video generation. |
182
- | `--i2v-resolution` | 720p | The resolution for the generated video. |
183
- | `--video-length` | 129 | The length of the generated video. |
184
- | `--infer-steps` | 50 | The number of steps for sampling. |
185
- | `--flow-shift` | 7.0 | Shift factor for flow matching schedulers . |
186
- | `--flow-reverse` | False | If reverse, learning/sampling from t=1 -> t=0. |
187
- | `--seed` | None | The random seed for generating video, if None, we init a random seed. |
188
- | `--use-cpu-offload` | False | Use CPU offload for the model load to save more memory, necessary for high-res video generation. |
189
- | `--save-path` | ./results | Path to save the generated video. |
 
 
190
 
191
 
192
- ## πŸŽ‰ Customizable I2V LoRA effects training
193
 
194
  ### Requirements
195
 
@@ -216,7 +283,7 @@ Prompt description: The trigger word is written directly in the video caption. I
216
 
217
  For example, AI hair growth effect (trigger): rapid_hair_growth, The hair of the characters in the video is growing rapidly. + original prompt
218
 
219
- After having the training video and prompt pair, refer to [here](https://github.com/Tencent/HunyuanVideo-I2V/blob/main/hyvideo/hyvae_extract/README.md) for training data construction.
220
 
221
 
222
  ### Training
@@ -259,7 +326,7 @@ We list some lora specific configurations for easy usage:
259
  |:-------------------:|:-------:|:----------------------------:|
260
  | `--use-lora` | False | Whether to open lora mode. |
261
  | `--lora-scale` | 1.0 | Fusion scale for lora model. |
262
- | `--lora-path` | "" | Weight path for lora model. |
263
 
264
 
265
  ## πŸ”— BibTeX
 
5
  ---
6
  <!-- ## **HunyuanVideo** -->
7
 
8
+ [δΈ­ζ–‡ι˜…θ―»](./README_zh.md)
9
+
10
  <p align="center">
11
  <img src="https://raw.githubusercontent.com/Tencent/HunyuanVideo-I2V/refs/heads/main/assets/logo.png" height=100>
12
  </p>
13
 
14
  # **HunyuanVideo-I2V** πŸŒ…
15
 
16
+ -----
17
+
18
  Following the great successful open-sourcing of our [HunyuanVideo](https://github.com/Tencent/HunyuanVideo), we proudly present the [HunyuanVideo-I2V](https://github.com/Tencent/HunyuanVideo-I2V), a new image-to-video generation framework to accelerate open-source community exploration!
19
 
20
  This repo contains offical PyTorch model definitions, pre-trained weights and inference/sampling code. You can find more visualizations on our [project page](https://aivideo.hunyuan.tencent.com). Meanwhile, we have released the LoRA training code for customizable special effects, which can be used to create more interesting video effects.
 
24
 
25
 
26
  ## πŸ”₯πŸ”₯πŸ”₯ News!!
27
+ * Mar 07, 2025: πŸ”₯ We have fixed the bug in our open-source version that caused ID changes. Please try the new model weights of [HunyuanVideo-I2V](https://huggingface.co/tencent/HunyuanVideo-I2V) to ensure full visual consistency in the first frame and produce higher quality videos.
28
  * Mar 06, 2025: πŸ‘‹ We release the inference code and model weights of HunyuanVideo-I2V. [Download](https://github.com/Tencent/HunyuanVideo-I2V/blob/main/ckpts/README.md).
29
 
30
 
31
+ ### Frist Frame Consistency Demo
32
+ | Reference Image | Generated Video |
33
+ |:----------------:|:----------------:|
34
+ | <img src="https://github.com/user-attachments/assets/83e7a097-ffca-40db-9c72-be01d866aa7d" width="80%"> | <video src="https://github.com/user-attachments/assets/f81d2c88-bb1a-43f8-b40f-1ccc20774563" width="100%"> </video> |
35
+ | <img src="https://github.com/user-attachments/assets/c385a11f-60c7-4919-b0f1-bc5e715f673c" width="80%"> | <video src="https://github.com/user-attachments/assets/0c29ede9-0481-4d40-9c67-a4b6267fdc2d" width="100%"> </video> |
36
+ | <img src="https://github.com/user-attachments/assets/5763f5eb-0be5-4b36-866a-5199e31c5802" width="95%"> | <video src="https://github.com/user-attachments/assets/a8da0a1b-ba7d-45a4-a901-5d213ceaf50e" width="100%"> </video> |
37
+
38
+ <!-- ### Customizable I2V LoRA Demo
39
+
40
+ | I2V Lora Effect | Reference Image | Generated Video |
41
+ |:---------------:|:--------------------------------:|:----------------:|
42
+ | Hair growth | <img src="./assets/demo/i2v_lora/imgs/hair_growth.png" width="40%"> | <video src="https://github.com/user-attachments/assets/06b998ae-bbde-4c1f-96cb-a25a9197d5cb" width="100%"> </video> |
43
+ | Embrace | <img src="./assets/demo/i2v_lora/imgs/embrace.png" width="40%"> | <video src="https://github.com/user-attachments/assets/f8c99eb1-2a43-489a-ba02-6bd50a6dd260" width="100%" > </video> |
44
+ <!-- | Hair growth | <img src="./assets/demo/i2v_lora/imgs/hair_growth.png" width="40%"> | <video src="https://github.com/user-attachments/assets/06b998ae-bbde-4c1f-96cb-a25a9197d5cb" width="100%" poster="./assets/demo/i2v_lora/imgs/hair_growth.png"> </video> |
45
+ | Embrace | <img src="./assets/demo/i2v_lora/imgs/embrace.png" width="40%"> | <video src="https://github.com/user-attachments/assets/f8c99eb1-2a43-489a-ba02-6bd50a6dd260" width="100%" poster="./assets/demo/i2v_lora/imgs/hair_growth.png"> </video> | -->
46
+
47
+ <!-- ## 🧩 Community Contributions -->
48
+
49
+ <!-- If you develop/use HunyuanVideo-I2V in your projects, welcome to let us know. -->
50
+
51
+ <!-- - ComfyUI-Kijai (FP8 Inference, V2V and IP2V Generation): [ComfyUI-HunyuanVideoWrapper](https://github.com/kijai/ComfyUI-HunyuanVideoWrapper) by [Kijai](https://github.com/kijai) -->
52
+ <!-- - ComfyUI-Native (Native Support): [ComfyUI-HunyuanVideo](https://comfyanonymous.github.io/ComfyUI_examples/hunyuan_video/) by [ComfyUI Official](https://github.com/comfyanonymous/ComfyUI) -->
53
+
54
+ <!-- - FastVideo (Consistency Distilled Model and Sliding Tile Attention): [FastVideo](https://github.com/hao-ai-lab/FastVideo) and [Sliding Tile Attention](https://hao-ai-lab.github.io/blogs/sta/) by [Hao AI Lab](https://hao-ai-lab.github.io/)
55
+ - HunyuanVideo-gguf (GGUF Version and Quantization): [HunyuanVideo-gguf](https://huggingface.co/city96/HunyuanVideo-gguf) by [city96](https://huggingface.co/city96)
56
+ - Enhance-A-Video (Better Generated Video for Free): [Enhance-A-Video](https://github.com/NUS-HPC-AI-Lab/Enhance-A-Video) by [NUS-HPC-AI-Lab](https://ai.comp.nus.edu.sg/)
57
+ - TeaCache (Cache-based Accelerate): [TeaCache](https://github.com/LiewFeng/TeaCache) by [Feng Liu](https://github.com/LiewFeng)
58
+ - HunyuanVideoGP (GPU Poor version): [HunyuanVideoGP](https://github.com/deepbeepmeep/HunyuanVideoGP) by [DeepBeepMeep](https://github.com/deepbeepmeep)
59
+ -->
60
+
61
+
62
+
63
  ## πŸ“‘ Open-source Plan
64
  - HunyuanVideo-I2V (Image-to-Video Model)
 
65
  - [x] Inference
66
  - [x] Checkpoints
67
  - [x] ComfyUI
68
+ - [ ] Lora training scripts
69
  - [ ] Multi-gpus Sequence Parallel inference (Faster inference speed on more gpus)
70
  - [ ] Diffusers
71
  - [ ] FP8 Quantified weight
 
81
  - [Installation Guide for Linux](#installation-guide-for-linux)
82
  - [🧱 Download Pretrained Models](#-download-pretrained-models)
83
  - [πŸ”‘ Single-gpu Inference](#-single-gpu-inference)
84
+ - [Tips for Using Image-to-Video Models](#tips-for-using-image-to-video-models)
85
  - [Using Command Line](#using-command-line)
86
  - [More Configurations](#more-configurations)
 
 
 
 
 
 
87
  - [πŸ”— BibTeX](#-bibtex)
88
  - [Acknowledgements](#acknowledgements)
89
  ---
90
 
91
  ## **HunyuanVideo-I2V Overall Architecture**
92
+ Leveraging the advanced video generation capabilities of [HunyuanVideo](https://github.com/Tencent/HunyuanVideo), we have extended its application to image-to-video generation tasks. To achieve this, we employ a token replace technique to effectively reconstruct and incorporate reference image information into the video generation process.
93
 
94
  Since we utilizes a pre-trained Multimodal Large Language Model (MLLM) with a Decoder-Only architecture as the text encoder, we can significantly enhance the model's ability to comprehend the semantic content of the input image and to seamlessly integrate information from both the image and its associated caption. Specifically, the input image is processed by the MLLM to generate semantic image tokens. These tokens are then concatenated with the video latent tokens, enabling comprehensive full-attention computation across the combined data.
95
 
 
167
 
168
  ## 🧱 Download Pretrained Models
169
 
170
+ The details of download pretrained models are shown [here](ckpts/README.md).
171
 
172
 
173
 
 
175
 
176
  Similar to [HunyuanVideo](https://github.com/Tencent/HunyuanVideo), HunyuanVideo-I2V supports high-resolution video generation, with resolution up to 720P and video length up to 129 frames (5 seconds).
177
 
178
+ ### Tips for Using Image-to-Video Models
179
+ - **Use Concise Prompts**: To effectively guide the model's generation, keep your prompts short and to the point.
180
+ - **Include Key Elements**: A well-structured prompt should cover:
181
+ - **Main Subject**: Specify the primary focus of the video.
182
+ - **Action**: Describe the main movement or activity taking place.
183
+ - **Background (Optional)**: Set the scene for the video.
184
+ - **Camera Angle (Optional)**: Indicate the perspective or viewpoint.
185
+ - **Avoid Overly Detailed Prompts**: Lengthy or highly detailed prompts can lead to unnecessary transitions in the video output.
186
+
187
+ <!-- **For image-to-video models, we recommend using concise prompts to guide the model's generation process. A good prompt should include elements such as background, main subject, action, and camera angle. Overly long or excessively detailed prompts may introduce unnecessary transitions.** -->
188
+
189
  ### Using Command Line
190
 
191
  <!-- ### Run a Gradio Server
 
195
  # set SERVER_NAME and SERVER_PORT manually
196
  # SERVER_NAME=0.0.0.0 SERVER_PORT=8081 python3 gradio_server.py --flow-reverse
197
  ``` -->
198
+ If you want to generate a more **stable** video, you can set `--i2v-stability` and `--flow-shift 7.0`. Execute the command as follows
199
  ```bash
200
  cd HunyuanVideo-I2V
201
 
202
  python3 sample_image2video.py \
203
+ --prompt "An Asian man with short hair in black tactical uniform and white clothes waves a firework stick." \
204
+ --i2v-image-path ./demo/imgs/0.jpg \
205
  --model HYVideo-T/2 \
 
206
  --i2v-mode \
 
207
  --i2v-resolution 720p \
208
+ --i2v-stability \
209
+ --infer-steps 50 \
210
  --video-length 129 \
211
+ --flow-reverse \
212
+ --flow-shift 7.0 \
213
+ --seed 0 \
214
+ --embedded-cfg-scale 6.0 \
215
+ --use-cpu-offload \
216
+ --save-path ./results
217
+ ```
218
+ If you want to generate a more **high-dynamic** video, you can **unset** `--i2v-stability` and `--flow-shift 17.0`. Execute the command as follows
219
+ ```bash
220
+ cd HunyuanVideo-I2V
221
+
222
+ python3 sample_image2video.py \
223
+ --prompt "An Asian man with short hair in black tactical uniform and white clothes waves a firework stick." \
224
+ --i2v-image-path ./demo/imgs/0.jpg \
225
+ --model HYVideo-T/2 \
226
+ --i2v-mode \
227
+ --i2v-resolution 720p \
228
  --infer-steps 50 \
229
+ --video-length 129 \
230
  --flow-reverse \
231
  --flow-shift 17.0 \
232
  --seed 0 \
233
+ --embedded-cfg-scale 6.0 \
234
  --use-cpu-offload \
235
+ --save-path ./results
236
  ```
237
  ### More Configurations
238
 
239
  We list some more useful configurations for easy usage:
240
 
241
+ | Argument | Default | Description |
242
+ |:----------------------:|:----------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
243
+ | `--prompt` | None | The text prompt for video generation. |
244
+ | `--model` | HYVideo-T/2-cfgdistill | Here we use HYVideo-T/2 for I2V, HYVideo-T/2-cfgdistill is used for T2V mode. |
245
+ | `--i2v-mode` | False | Whether to open i2v mode. |
246
+ | `--i2v-image-path` | ./assets/demo/i2v/imgs/0.jpg | The reference image for video generation. |
247
+ | `--i2v-resolution` | 720p | The resolution for the generated video. |
248
+ | `--i2v-stability` | False | Whether to use stable mode for i2v inference. |
249
+ | `--video-length` | 129 | The length of the generated video. |
250
+ | `--infer-steps` | 50 | The number of steps for sampling. |
251
+ | `--flow-shift` | 7.0 | Shift factor for flow matching schedulers. We recommend 7 with `--i2v-stability` switch on for more stable video, 17 with `--i2v-stability` switch off for more dynamic video |
252
+ | `--flow-reverse` | False | If reverse, learning/sampling from t=1 -> t=0. |
253
+ | `--seed` | None | The random seed for generating video, if None, we init a random seed. |
254
+ | `--use-cpu-offload` | False | Use CPU offload for the model load to save more memory, necessary for high-res video generation. |
255
+ | `--save-path` | ./results | Path to save the generated video. |
256
+
257
 
258
 
259
+ <!-- ## πŸŽ‰ Customizable I2V LoRA effects training
260
 
261
  ### Requirements
262
 
 
283
 
284
  For example, AI hair growth effect (trigger): rapid_hair_growth, The hair of the characters in the video is growing rapidly. + original prompt
285
 
286
+ After having the training video and prompt pair, refer to [here](hyvideo/hyvae_extract/README.md) for training data construction.
287
 
288
 
289
  ### Training
 
326
  |:-------------------:|:-------:|:----------------------------:|
327
  | `--use-lora` | False | Whether to open lora mode. |
328
  | `--lora-scale` | 1.0 | Fusion scale for lora model. |
329
+ | `--lora-path` | "" | Weight path for lora model. | -->
330
 
331
 
332
  ## πŸ”— BibTeX