HunyuanDiT
Diffusers
Safetensors
English
Chinese
Tencent-Hunyuan commited on
Commit
68cf368
β€’
1 Parent(s): 274f706

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -21
README.md CHANGED
@@ -14,9 +14,10 @@ language:
14
  # Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
15
 
16
 
 
17
  This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring Hunyuan-DiT. You can find more visualizations on our [project page](https://dit.hunyuan.tencent.com/).
18
 
19
- > [**Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding**](https://tencent.github.io/HunyuanDiT/asset/Hunyuan_DiT_Tech_Report_05140553.pdf) <br>
20
  > Zhimin Li*, Jianwei Zhang*, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu, Zheng Fang, Weiyan Wang, Jinbao Xue, Yangyu Tao, JianChen Zhu, Kai Liu, Sihuan Lin, Yifu Sun, Yun Li, Dongdong Wang, Zhichao Hu, Xiao Xiao, Yan Chen, Yuhong Liu, Wei Liu, Di Wang, Yong Yang, Jie Jiang, Qinglin Lu‑
21
  > <br>Tencent Hunyuan<br>
22
 
@@ -24,11 +25,11 @@ This repo contains PyTorch model definitions, pre-trained weights and inference/
24
  > Minbin Huang*, Yanxin Long*, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu&#8224;, Wei Liu
25
  > <br>Chinese University of Hong Kong, Tencent Hunyuan, Shenzhen Campus of Sun Yat-sen University<br>
26
 
27
- ---
28
 
29
- ## πŸ”₯πŸ”₯πŸ”₯ Hunyuan Assistant
30
 
31
- Welcome to [Hunyuan Assistant](https://hunyuan.tencent.com/bot/chat) to experience our products! Simply enter the suggested prompts or any other **creative prompts with the drawing intention words** to activate the hunyuan text2image generation function.
 
 
32
  > η”»δΈ€εͺ穿着θ₯Ώθ£…ηš„ηŒͺ
33
  >
34
  > draw a pig in a suit
@@ -45,7 +46,7 @@ Welcome to [Hunyuan Assistant](https://hunyuan.tencent.com/bot/chat) to experien
45
  - [ ] Distillation Version (Coming soon ⏩️)
46
  - [ ] TensorRT Version (Coming soon ⏩️)
47
  - [ ] Training (Coming later ⏩️)
48
- - DialogGen (Prompt Enhancement Model)
49
  - [x] Inference
50
  - [X] Web Demo (Gradio)
51
  - [X] Cli Demo
@@ -53,19 +54,19 @@ Welcome to [Hunyuan Assistant](https://hunyuan.tencent.com/bot/chat) to experien
53
  ## Contents
54
  - [Hunyuan-DiT](#hunyuan-dit--a-powerful-multi-resolution-diffusion-transformer-with-fine-grained-chinese-understanding)
55
  - [Abstract](#abstract)
56
- - [πŸŽ‰ Hunyuan-DiT Key Features](#πŸŽ‰-hunyuan-dit-key-features)
57
  - [Chinese-English Bilingual DiT Architecture](#chinese-english-bilingual-dit-architecture)
58
  - [Multi-turn Text2Image Generation](#multi-turn-text2image-generation)
59
- - [πŸ“ˆ Comparisons](#comparisons)
60
- - [πŸŽ₯ Visualization](#πŸŽ₯-visualization)
61
- - [πŸ“œ Requirements](#πŸ“œ-requirements)
62
- - [πŸ›  Dependencies and Installation](#πŸ› %EF%B8%8F-dependencies-and-installation)
63
- - [🧱 Download Pretrained Models](#🧱-download-pretrained-models)
64
- - [πŸ”‘ Inference](#πŸ”‘-inference)
65
  - [Using Gradio](#using-gradio)
66
  - [Using Command Line](#using-command-line)
67
  - [More Configurations](#more-configurations)
68
- - [πŸ”— BibTeX](#πŸ”—-bibtex)
69
 
70
  ## **Abstract**
71
 
@@ -90,7 +91,7 @@ and output the new text prompt for image generation.
90
  <img src="https://raw.githubusercontent.com/Tencent/HunyuanDiT/main/asset/mllm.png" height=300>
91
  </p>
92
 
93
- ## Comparisons
94
  In order to comprehensively compare the generation capabilities of HunyuanDiT and other models, we constructed a 4-dimensional test set, including Text-Image Consistency, Excluding AI Artifacts, Subject Clarity, Aesthetic. More than 50 professional evaluators performs the evaluation.
95
 
96
  <p align="center">
@@ -124,7 +125,8 @@ In order to comprehensively compare the generation capabilities of HunyuanDiT an
124
  <tr style="font-weight: bold; background-color: #f2f2f2;">
125
  <td>Hunyuan-DiT</td><td>βœ”</td> <td>74.2</td> <td>74.3</td> <td>95.4</td> <td>86.6</td> <td>59.0</td>
126
  </tr>
127
- </table>
 
128
  </p>
129
 
130
  ## πŸŽ₯ Visualization
@@ -138,13 +140,14 @@ In order to comprehensively compare the generation capabilities of HunyuanDiT an
138
 
139
 
140
  <p align="center">
141
- <img src="https://raw.githubusercontent.com/Tencent/HunyuanDiT/main/asset/long text understanding.png" height=300>
142
  </p>
143
 
144
  * **Multi-turn Text2Image Generation**
145
 
 
 
146
 
147
- [demo video](https://youtu.be/4AaHrYnuIcE)
148
 
149
  ---
150
 
@@ -282,12 +285,22 @@ We list some more useful configurations for easy usage:
282
 
283
 
284
  # πŸ”— BibTeX
285
- If you find Hunyuan-DiT useful for your research and applications, please cite using this BibTeX:
286
 
287
  ```BibTeX
288
- @misc{hunyuandit,
289
- title={Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding},
290
- author={Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu, Zheng Fang, Weiyan Wang, Jinbao Xue, Yangyu Tao, JianChen Zhu, Kai Liu, Sihuan Lin, Yifu Sun, Yun Li, Dongdong Wang, Zhichao Hu, Xiao Xiao, Yan Chen, Yuhong Liu, Wei Liu, Di Wang, Yong Yang, Jie Jiang, Qinglin Lu},
291
  year={2024},
 
 
 
 
 
 
 
 
 
 
292
  }
293
  ```
 
14
  # Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
15
 
16
 
17
+
18
  This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring Hunyuan-DiT. You can find more visualizations on our [project page](https://dit.hunyuan.tencent.com/).
19
 
20
+ > [**Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding**](https://arxiv.org/abs/2405.08748) <br>
21
  > Zhimin Li*, Jianwei Zhang*, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu, Zheng Fang, Weiyan Wang, Jinbao Xue, Yangyu Tao, JianChen Zhu, Kai Liu, Sihuan Lin, Yifu Sun, Yun Li, Dongdong Wang, Zhichao Hu, Xiao Xiao, Yan Chen, Yuhong Liu, Wei Liu, Di Wang, Yong Yang, Jie Jiang, Qinglin Lu‑
22
  > <br>Tencent Hunyuan<br>
23
 
 
25
  > Minbin Huang*, Yanxin Long*, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu&#8224;, Wei Liu
26
  > <br>Chinese University of Hong Kong, Tencent Hunyuan, Shenzhen Campus of Sun Yat-sen University<br>
27
 
 
28
 
 
29
 
30
+ ## πŸ”₯πŸ”₯πŸ”₯ Tencent Hunyuan Bot
31
+
32
+ Welcome to [Tencent Hunyuan Bot](https://hunyuan.tencent.com/bot/chat), where you can explore our innovative products! Just input the suggested prompts below or any other **imaginative prompts containing drawing-related keywords** to activate the Hunyuan text-to-image generation feature. You can use **simple prompts** as well as **multi-turn language interactions** to create the picture. Unleash your creativity and create any picture you desire, **all for free!**
33
  > η”»δΈ€εͺ穿着θ₯Ώθ£…ηš„ηŒͺ
34
  >
35
  > draw a pig in a suit
 
46
  - [ ] Distillation Version (Coming soon ⏩️)
47
  - [ ] TensorRT Version (Coming soon ⏩️)
48
  - [ ] Training (Coming later ⏩️)
49
+ - [DialogGen](https://github.com/Centaurusalpha/DialogGen) (Prompt Enhancement Model)
50
  - [x] Inference
51
  - [X] Web Demo (Gradio)
52
  - [X] Cli Demo
 
54
  ## Contents
55
  - [Hunyuan-DiT](#hunyuan-dit--a-powerful-multi-resolution-diffusion-transformer-with-fine-grained-chinese-understanding)
56
  - [Abstract](#abstract)
57
+ - [πŸŽ‰ Hunyuan-DiT Key Features](#-hunyuan-dit-key-features)
58
  - [Chinese-English Bilingual DiT Architecture](#chinese-english-bilingual-dit-architecture)
59
  - [Multi-turn Text2Image Generation](#multi-turn-text2image-generation)
60
+ - [πŸ“ˆ Comparisons](#-comparisons)
61
+ - [πŸŽ₯ Visualization](#-visualization)
62
+ - [πŸ“œ Requirements](#-requirements)
63
+ - [πŸ›  Dependencies and Installation](#%EF%B8%8F-dependencies-and-installation)
64
+ - [🧱 Download Pretrained Models](#-download-pretrained-models)
65
+ - [πŸ”‘ Inference](#-inference)
66
  - [Using Gradio](#using-gradio)
67
  - [Using Command Line](#using-command-line)
68
  - [More Configurations](#more-configurations)
69
+ - [πŸ”— BibTeX](#-bibtex)
70
 
71
  ## **Abstract**
72
 
 
91
  <img src="https://raw.githubusercontent.com/Tencent/HunyuanDiT/main/asset/mllm.png" height=300>
92
  </p>
93
 
94
+ ## πŸ“ˆ Comparisons
95
  In order to comprehensively compare the generation capabilities of HunyuanDiT and other models, we constructed a 4-dimensional test set, including Text-Image Consistency, Excluding AI Artifacts, Subject Clarity, Aesthetic. More than 50 professional evaluators performs the evaluation.
96
 
97
  <p align="center">
 
125
  <tr style="font-weight: bold; background-color: #f2f2f2;">
126
  <td>Hunyuan-DiT</td><td>βœ”</td> <td>74.2</td> <td>74.3</td> <td>95.4</td> <td>86.6</td> <td>59.0</td>
127
  </tr>
128
+ </tbody>
129
+ </table>
130
  </p>
131
 
132
  ## πŸŽ₯ Visualization
 
140
 
141
 
142
  <p align="center">
143
+ <img src="https://raw.githubusercontent.com/Tencent/HunyuanDiT/main/asset/long text understanding.png" height=310>
144
  </p>
145
 
146
  * **Multi-turn Text2Image Generation**
147
 
148
+ https://github.com/Tencent/tencent.github.io/assets/27557933/94b4dcc3-104d-44e1-8bb2-dc55108763d1
149
+
150
 
 
151
 
152
  ---
153
 
 
285
 
286
 
287
  # πŸ”— BibTeX
288
+ If you find [Hunyuan-DiT](https://arxiv.org/abs/2405.08748) or [DialogGen](https://arxiv.org/abs/2403.08857) useful for your research and applications, please cite using this BibTeX:
289
 
290
  ```BibTeX
291
+ @misc{li2024hunyuandit,
292
+ title={Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding},
293
+ author={Zhimin Li and Jianwei Zhang and Qin Lin and Jiangfeng Xiong and Yanxin Long and Xinchi Deng and Yingfang Zhang and Xingchao Liu and Minbin Huang and Zedong Xiao and Dayou Chen and Jiajun He and Jiahao Li and Wenyue Li and Chen Zhang and Rongwei Quan and Jianxiang Lu and Jiabin Huang and Xiaoyan Yuan and Xiaoxiao Zheng and Yixuan Li and Jihong Zhang and Chao Zhang and Meng Chen and Jie Liu and Zheng Fang and Weiyan Wang and Jinbao Xue and Yangyu Tao and Jianchen Zhu and Kai Liu and Sihuan Lin and Yifu Sun and Yun Li and Dongdong Wang and Mingtao Chen and Zhichao Hu and Xiao Xiao and Yan Chen and Yuhong Liu and Wei Liu and Di Wang and Yong Yang and Jie Jiang and Qinglin Lu},
294
  year={2024},
295
+ eprint={2405.08748},
296
+ archivePrefix={arXiv},
297
+ primaryClass={cs.CV}
298
+ }
299
+
300
+ @article{huang2024dialoggen,
301
+ title={DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation},
302
+ author={Huang, Minbin and Long, Yanxin and Deng, Xinchi and Chu, Ruihang and Xiong, Jiangfeng and Liang, Xiaodan and Cheng, Hong and Lu, Qinglin and Liu, Wei},
303
+ journal={arXiv preprint arXiv:2403.08857},
304
+ year={2024}
305
  }
306
  ```