Tencent-Hunyuan
commited on
Commit
β’
68cf368
1
Parent(s):
274f706
Update README.md
Browse files
README.md
CHANGED
@@ -14,9 +14,10 @@ language:
|
|
14 |
# Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
|
15 |
|
16 |
|
|
|
17 |
This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring Hunyuan-DiT. You can find more visualizations on our [project page](https://dit.hunyuan.tencent.com/).
|
18 |
|
19 |
-
> [**Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding**](https://
|
20 |
> Zhimin Li*, Jianwei Zhang*, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu, Zheng Fang, Weiyan Wang, Jinbao Xue, Yangyu Tao, JianChen Zhu, Kai Liu, Sihuan Lin, Yifu Sun, Yun Li, Dongdong Wang, Zhichao Hu, Xiao Xiao, Yan Chen, Yuhong Liu, Wei Liu, Di Wang, Yong Yang, Jie Jiang, Qinglin Luβ‘
|
21 |
> <br>Tencent Hunyuan<br>
|
22 |
|
@@ -24,11 +25,11 @@ This repo contains PyTorch model definitions, pre-trained weights and inference/
|
|
24 |
> Minbin Huang*, Yanxin Long*, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu†, Wei Liu
|
25 |
> <br>Chinese University of Hong Kong, Tencent Hunyuan, Shenzhen Campus of Sun Yat-sen University<br>
|
26 |
|
27 |
-
---
|
28 |
|
29 |
-
## π₯π₯π₯ Hunyuan Assistant
|
30 |
|
31 |
-
|
|
|
|
|
32 |
> η»δΈεͺη©Ώηθ₯Ώθ£
ηηͺ
|
33 |
>
|
34 |
> draw a pig in a suit
|
@@ -45,7 +46,7 @@ Welcome to [Hunyuan Assistant](https://hunyuan.tencent.com/bot/chat) to experien
|
|
45 |
- [ ] Distillation Version (Coming soon β©οΈ)
|
46 |
- [ ] TensorRT Version (Coming soon β©οΈ)
|
47 |
- [ ] Training (Coming later β©οΈ)
|
48 |
-
- DialogGen (Prompt Enhancement Model)
|
49 |
- [x] Inference
|
50 |
- [X] Web Demo (Gradio)
|
51 |
- [X] Cli Demo
|
@@ -53,19 +54,19 @@ Welcome to [Hunyuan Assistant](https://hunyuan.tencent.com/bot/chat) to experien
|
|
53 |
## Contents
|
54 |
- [Hunyuan-DiT](#hunyuan-dit--a-powerful-multi-resolution-diffusion-transformer-with-fine-grained-chinese-understanding)
|
55 |
- [Abstract](#abstract)
|
56 |
-
- [π Hunyuan-DiT Key Features](
|
57 |
- [Chinese-English Bilingual DiT Architecture](#chinese-english-bilingual-dit-architecture)
|
58 |
- [Multi-turn Text2Image Generation](#multi-turn-text2image-generation)
|
59 |
-
- [π Comparisons](
|
60 |
-
- [π₯ Visualization](
|
61 |
-
- [π Requirements](
|
62 |
-
- [π Dependencies and Installation](
|
63 |
-
- [𧱠Download Pretrained Models](
|
64 |
-
- [π Inference](
|
65 |
- [Using Gradio](#using-gradio)
|
66 |
- [Using Command Line](#using-command-line)
|
67 |
- [More Configurations](#more-configurations)
|
68 |
-
- [π BibTeX](
|
69 |
|
70 |
## **Abstract**
|
71 |
|
@@ -90,7 +91,7 @@ and output the new text prompt for image generation.
|
|
90 |
<img src="https://raw.githubusercontent.com/Tencent/HunyuanDiT/main/asset/mllm.png" height=300>
|
91 |
</p>
|
92 |
|
93 |
-
## Comparisons
|
94 |
In order to comprehensively compare the generation capabilities of HunyuanDiT and other models, we constructed a 4-dimensional test set, including Text-Image Consistency, Excluding AI Artifacts, Subject Clarity, Aesthetic. More than 50 professional evaluators performs the evaluation.
|
95 |
|
96 |
<p align="center">
|
@@ -124,7 +125,8 @@ In order to comprehensively compare the generation capabilities of HunyuanDiT an
|
|
124 |
<tr style="font-weight: bold; background-color: #f2f2f2;">
|
125 |
<td>Hunyuan-DiT</td><td>β</td> <td>74.2</td> <td>74.3</td> <td>95.4</td> <td>86.6</td> <td>59.0</td>
|
126 |
</tr>
|
127 |
-
</
|
|
|
128 |
</p>
|
129 |
|
130 |
## π₯ Visualization
|
@@ -138,13 +140,14 @@ In order to comprehensively compare the generation capabilities of HunyuanDiT an
|
|
138 |
|
139 |
|
140 |
<p align="center">
|
141 |
-
<img src="https://raw.githubusercontent.com/Tencent/HunyuanDiT/main/asset/long text understanding.png" height=
|
142 |
</p>
|
143 |
|
144 |
* **Multi-turn Text2Image Generation**
|
145 |
|
|
|
|
|
146 |
|
147 |
-
[demo video](https://youtu.be/4AaHrYnuIcE)
|
148 |
|
149 |
---
|
150 |
|
@@ -282,12 +285,22 @@ We list some more useful configurations for easy usage:
|
|
282 |
|
283 |
|
284 |
# π BibTeX
|
285 |
-
If you find Hunyuan-DiT useful for your research and applications, please cite using this BibTeX:
|
286 |
|
287 |
```BibTeX
|
288 |
-
@misc{
|
289 |
-
title={Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding},
|
290 |
-
author={Zhimin Li
|
291 |
year={2024},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
292 |
}
|
293 |
```
|
|
|
14 |
# Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
|
15 |
|
16 |
|
17 |
+
|
18 |
This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring Hunyuan-DiT. You can find more visualizations on our [project page](https://dit.hunyuan.tencent.com/).
|
19 |
|
20 |
+
> [**Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding**](https://arxiv.org/abs/2405.08748) <br>
|
21 |
> Zhimin Li*, Jianwei Zhang*, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu, Zheng Fang, Weiyan Wang, Jinbao Xue, Yangyu Tao, JianChen Zhu, Kai Liu, Sihuan Lin, Yifu Sun, Yun Li, Dongdong Wang, Zhichao Hu, Xiao Xiao, Yan Chen, Yuhong Liu, Wei Liu, Di Wang, Yong Yang, Jie Jiang, Qinglin Luβ‘
|
22 |
> <br>Tencent Hunyuan<br>
|
23 |
|
|
|
25 |
> Minbin Huang*, Yanxin Long*, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu†, Wei Liu
|
26 |
> <br>Chinese University of Hong Kong, Tencent Hunyuan, Shenzhen Campus of Sun Yat-sen University<br>
|
27 |
|
|
|
28 |
|
|
|
29 |
|
30 |
+
## π₯π₯π₯ Tencent Hunyuan Bot
|
31 |
+
|
32 |
+
Welcome to [Tencent Hunyuan Bot](https://hunyuan.tencent.com/bot/chat), where you can explore our innovative products! Just input the suggested prompts below or any other **imaginative prompts containing drawing-related keywords** to activate the Hunyuan text-to-image generation feature. You can use **simple prompts** as well as **multi-turn language interactions** to create the picture. Unleash your creativity and create any picture you desire, **all for free!**
|
33 |
> η»δΈεͺη©Ώηθ₯Ώθ£
ηηͺ
|
34 |
>
|
35 |
> draw a pig in a suit
|
|
|
46 |
- [ ] Distillation Version (Coming soon β©οΈ)
|
47 |
- [ ] TensorRT Version (Coming soon β©οΈ)
|
48 |
- [ ] Training (Coming later β©οΈ)
|
49 |
+
- [DialogGen](https://github.com/Centaurusalpha/DialogGen) (Prompt Enhancement Model)
|
50 |
- [x] Inference
|
51 |
- [X] Web Demo (Gradio)
|
52 |
- [X] Cli Demo
|
|
|
54 |
## Contents
|
55 |
- [Hunyuan-DiT](#hunyuan-dit--a-powerful-multi-resolution-diffusion-transformer-with-fine-grained-chinese-understanding)
|
56 |
- [Abstract](#abstract)
|
57 |
+
- [π Hunyuan-DiT Key Features](#-hunyuan-dit-key-features)
|
58 |
- [Chinese-English Bilingual DiT Architecture](#chinese-english-bilingual-dit-architecture)
|
59 |
- [Multi-turn Text2Image Generation](#multi-turn-text2image-generation)
|
60 |
+
- [π Comparisons](#-comparisons)
|
61 |
+
- [π₯ Visualization](#-visualization)
|
62 |
+
- [π Requirements](#-requirements)
|
63 |
+
- [π Dependencies and Installation](#%EF%B8%8F-dependencies-and-installation)
|
64 |
+
- [𧱠Download Pretrained Models](#-download-pretrained-models)
|
65 |
+
- [π Inference](#-inference)
|
66 |
- [Using Gradio](#using-gradio)
|
67 |
- [Using Command Line](#using-command-line)
|
68 |
- [More Configurations](#more-configurations)
|
69 |
+
- [π BibTeX](#-bibtex)
|
70 |
|
71 |
## **Abstract**
|
72 |
|
|
|
91 |
<img src="https://raw.githubusercontent.com/Tencent/HunyuanDiT/main/asset/mllm.png" height=300>
|
92 |
</p>
|
93 |
|
94 |
+
## π Comparisons
|
95 |
In order to comprehensively compare the generation capabilities of HunyuanDiT and other models, we constructed a 4-dimensional test set, including Text-Image Consistency, Excluding AI Artifacts, Subject Clarity, Aesthetic. More than 50 professional evaluators performs the evaluation.
|
96 |
|
97 |
<p align="center">
|
|
|
125 |
<tr style="font-weight: bold; background-color: #f2f2f2;">
|
126 |
<td>Hunyuan-DiT</td><td>β</td> <td>74.2</td> <td>74.3</td> <td>95.4</td> <td>86.6</td> <td>59.0</td>
|
127 |
</tr>
|
128 |
+
</tbody>
|
129 |
+
</table>
|
130 |
</p>
|
131 |
|
132 |
## π₯ Visualization
|
|
|
140 |
|
141 |
|
142 |
<p align="center">
|
143 |
+
<img src="https://raw.githubusercontent.com/Tencent/HunyuanDiT/main/asset/long text understanding.png" height=310>
|
144 |
</p>
|
145 |
|
146 |
* **Multi-turn Text2Image Generation**
|
147 |
|
148 |
+
https://github.com/Tencent/tencent.github.io/assets/27557933/94b4dcc3-104d-44e1-8bb2-dc55108763d1
|
149 |
+
|
150 |
|
|
|
151 |
|
152 |
---
|
153 |
|
|
|
285 |
|
286 |
|
287 |
# π BibTeX
|
288 |
+
If you find [Hunyuan-DiT](https://arxiv.org/abs/2405.08748) or [DialogGen](https://arxiv.org/abs/2403.08857) useful for your research and applications, please cite using this BibTeX:
|
289 |
|
290 |
```BibTeX
|
291 |
+
@misc{li2024hunyuandit,
|
292 |
+
title={Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding},
|
293 |
+
author={Zhimin Li and Jianwei Zhang and Qin Lin and Jiangfeng Xiong and Yanxin Long and Xinchi Deng and Yingfang Zhang and Xingchao Liu and Minbin Huang and Zedong Xiao and Dayou Chen and Jiajun He and Jiahao Li and Wenyue Li and Chen Zhang and Rongwei Quan and Jianxiang Lu and Jiabin Huang and Xiaoyan Yuan and Xiaoxiao Zheng and Yixuan Li and Jihong Zhang and Chao Zhang and Meng Chen and Jie Liu and Zheng Fang and Weiyan Wang and Jinbao Xue and Yangyu Tao and Jianchen Zhu and Kai Liu and Sihuan Lin and Yifu Sun and Yun Li and Dongdong Wang and Mingtao Chen and Zhichao Hu and Xiao Xiao and Yan Chen and Yuhong Liu and Wei Liu and Di Wang and Yong Yang and Jie Jiang and Qinglin Lu},
|
294 |
year={2024},
|
295 |
+
eprint={2405.08748},
|
296 |
+
archivePrefix={arXiv},
|
297 |
+
primaryClass={cs.CV}
|
298 |
+
}
|
299 |
+
|
300 |
+
@article{huang2024dialoggen,
|
301 |
+
title={DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation},
|
302 |
+
author={Huang, Minbin and Long, Yanxin and Deng, Xinchi and Chu, Ruihang and Xiong, Jiangfeng and Liang, Xiaodan and Cheng, Hong and Lu, Qinglin and Liu, Wei},
|
303 |
+
journal={arXiv preprint arXiv:2403.08857},
|
304 |
+
year={2024}
|
305 |
}
|
306 |
```
|