Tencent-Hunyuan
/

HunyuanDiT

HunyuanDiT

Diffusers

Safetensors

English

Chinese

Model card Files Files and versions Community

Tencent-Hunyuan commited on May 15

Commit

68cf368

•

1 Parent(s): 274f706

Update README.md

Browse files

Files changed (1) hide show

README.md +34 -21

README.md CHANGED Viewed

@@ -14,9 +14,10 @@ language:
 # Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
 This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring Hunyuan-DiT. You can find more visualizations on our [project page](https://dit.hunyuan.tencent.com/).
-> [**Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding**](https://tencent.github.io/HunyuanDiT/asset/Hunyuan_DiT_Tech_Report_05140553.pdf) <br>
 > Zhimin Li*, Jianwei Zhang*, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu, Zheng Fang, Weiyan Wang, Jinbao Xue, Yangyu Tao, JianChen Zhu, Kai Liu, Sihuan Lin, Yifu Sun, Yun Li, Dongdong Wang, Zhichao Hu, Xiao Xiao, Yan Chen, Yuhong Liu, Wei Liu, Di Wang, Yong Yang, Jie Jiang, Qinglin Lu‡
 > <br>Tencent Hunyuan<br>
@@ -24,11 +25,11 @@ This repo contains PyTorch model definitions, pre-trained weights and inference/
 > Minbin Huang*, Yanxin Long*, Xinchi Deng,  Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu&#8224;, Wei Liu
 > <br>Chinese University of Hong Kong, Tencent Hunyuan, Shenzhen Campus of Sun Yat-sen University<br>
----
-## 🔥🔥🔥 Hunyuan Assistant
-Welcome to [Hunyuan Assistant](https://hunyuan.tencent.com/bot/chat) to experience our products! Simply enter the suggested prompts or any other **creative prompts with the drawing intention words** to activate the hunyuan text2image generation function.
 > 画一只穿着西装的猪
 >
 > draw a pig in a suit
@@ -45,7 +46,7 @@ Welcome to [Hunyuan Assistant](https://hunyuan.tencent.com/bot/chat) to experien
   - [ ] Distillation Version (Coming soon ⏩️)
   - [ ] TensorRT Version (Coming soon ⏩️)
   - [ ] Training (Coming later ⏩️)
-- DialogGen (Prompt Enhancement Model)
   - [x] Inference
 - [X] Web Demo (Gradio)
 - [X] Cli Demo
@@ -53,19 +54,19 @@ Welcome to [Hunyuan Assistant](https://hunyuan.tencent.com/bot/chat) to experien
 ## Contents
 - [Hunyuan-DiT](#hunyuan-dit--a-powerful-multi-resolution-diffusion-transformer-with-fine-grained-chinese-understanding)
   - [Abstract](#abstract)
-  - [🎉 Hunyuan-DiT Key Features](#🎉-hunyuan-dit-key-features)
     - [Chinese-English Bilingual DiT Architecture](#chinese-english-bilingual-dit-architecture)
     - [Multi-turn Text2Image Generation](#multi-turn-text2image-generation)
-  - [📈 Comparisons](#comparisons)
-  - [🎥 Visualization](#🎥-visualization)
-  - [📜 Requirements](#📜-requirements)
-  - [🛠 Dependencies and Installation](#🛠%EF%B8%8F-dependencies-and-installation)
-  - [🧱 Download Pretrained Models](#🧱-download-pretrained-models)
-  - [🔑 Inference](#🔑-inference)
     - [Using Gradio](#using-gradio)
     - [Using Command Line](#using-command-line)
     - [More Configurations](#more-configurations)
-  - [🔗 BibTeX](#🔗-bibtex)
 ## **Abstract**
@@ -90,7 +91,7 @@ and output the new text prompt for image generation.
   <img src="https://raw.githubusercontent.com/Tencent/HunyuanDiT/main/asset/mllm.png"  height=300>
 </p>
-## Comparisons
 In order to comprehensively compare the generation capabilities of HunyuanDiT and other models, we constructed a 4-dimensional test set, including Text-Image Consistency, Excluding AI Artifacts, Subject Clarity, Aesthetic. More than 50 professional evaluators performs the evaluation.
 <p align="center">
@@ -124,7 +125,8 @@ In order to comprehensively compare the generation capabilities of HunyuanDiT an
 <tr style="font-weight: bold; background-color: #f2f2f2;">
     <td>Hunyuan-DiT</td><td>✔</td> <td>74.2</td> <td>74.3</td> <td>95.4</td> <td>86.6</td> <td>59.0</td>
 </tr>
-</table>
 </p>
 ## 🎥 Visualization
@@ -138,13 +140,14 @@ In order to comprehensively compare the generation capabilities of HunyuanDiT an
 <p align="center">
-  <img src="https://raw.githubusercontent.com/Tencent/HunyuanDiT/main/asset/long text understanding.png"  height=300>
 </p>
 * **Multi-turn Text2Image Generation**
-[demo video](https://youtu.be/4AaHrYnuIcE)
 ---
@@ -282,12 +285,22 @@ We list some more useful configurations for easy usage:
 # 🔗 BibTeX
-If you find Hunyuan-DiT useful for your research and applications, please cite using this BibTeX:
 ```BibTeX
-@misc{hunyuandit,
-      title={Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding},
-      author={Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu, Zheng Fang, Weiyan Wang, Jinbao Xue, Yangyu Tao, JianChen Zhu, Kai Liu, Sihuan Lin, Yifu Sun, Yun Li, Dongdong Wang, Zhichao Hu, Xiao Xiao, Yan Chen, Yuhong Liu, Wei Liu, Di Wang, Yong Yang, Jie Jiang, Qinglin Lu},
       year={2024},
 }
 ```

 # Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
 This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring Hunyuan-DiT. You can find more visualizations on our [project page](https://dit.hunyuan.tencent.com/).
+> [**Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding**](https://arxiv.org/abs/2405.08748) <br>
 > Zhimin Li*, Jianwei Zhang*, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu, Zheng Fang, Weiyan Wang, Jinbao Xue, Yangyu Tao, JianChen Zhu, Kai Liu, Sihuan Lin, Yifu Sun, Yun Li, Dongdong Wang, Zhichao Hu, Xiao Xiao, Yan Chen, Yuhong Liu, Wei Liu, Di Wang, Yong Yang, Jie Jiang, Qinglin Lu‡
 > <br>Tencent Hunyuan<br>
 > Minbin Huang*, Yanxin Long*, Xinchi Deng,  Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu&#8224;, Wei Liu
 > <br>Chinese University of Hong Kong, Tencent Hunyuan, Shenzhen Campus of Sun Yat-sen University<br>
+## 🔥🔥🔥 Tencent Hunyuan Bot
+Welcome to [Tencent Hunyuan Bot](https://hunyuan.tencent.com/bot/chat), where you can explore our innovative products! Just input the suggested prompts below or any other **imaginative prompts containing drawing-related keywords** to activate the Hunyuan text-to-image generation feature.  You can use **simple prompts** as well as **multi-turn language interactions** to create the picture. Unleash your creativity and create any picture you desire, **all for free!**
 > 画一只穿着西装的猪
 >
 > draw a pig in a suit
   - [ ] Distillation Version (Coming soon ⏩️)
   - [ ] TensorRT Version (Coming soon ⏩️)
   - [ ] Training (Coming later ⏩️)
+- [DialogGen](https://github.com/Centaurusalpha/DialogGen) (Prompt Enhancement Model)
   - [x] Inference
 - [X] Web Demo (Gradio)
 - [X] Cli Demo
 ## Contents
 - [Hunyuan-DiT](#hunyuan-dit--a-powerful-multi-resolution-diffusion-transformer-with-fine-grained-chinese-understanding)
   - [Abstract](#abstract)
+  - [🎉 Hunyuan-DiT Key Features](#-hunyuan-dit-key-features)
     - [Chinese-English Bilingual DiT Architecture](#chinese-english-bilingual-dit-architecture)
     - [Multi-turn Text2Image Generation](#multi-turn-text2image-generation)
+  - [📈 Comparisons](#-comparisons)
+  - [🎥 Visualization](#-visualization)
+  - [📜 Requirements](#-requirements)
+  - [🛠 Dependencies and Installation](#%EF%B8%8F-dependencies-and-installation)
+  - [🧱 Download Pretrained Models](#-download-pretrained-models)
+  - [🔑 Inference](#-inference)
     - [Using Gradio](#using-gradio)
     - [Using Command Line](#using-command-line)
     - [More Configurations](#more-configurations)
+  - [🔗 BibTeX](#-bibtex)
 ## **Abstract**
   <img src="https://raw.githubusercontent.com/Tencent/HunyuanDiT/main/asset/mllm.png"  height=300>
 </p>
+## 📈 Comparisons
 In order to comprehensively compare the generation capabilities of HunyuanDiT and other models, we constructed a 4-dimensional test set, including Text-Image Consistency, Excluding AI Artifacts, Subject Clarity, Aesthetic. More than 50 professional evaluators performs the evaluation.
 <p align="center">
 <tr style="font-weight: bold; background-color: #f2f2f2;">
     <td>Hunyuan-DiT</td><td>✔</td> <td>74.2</td> <td>74.3</td> <td>95.4</td> <td>86.6</td> <td>59.0</td>
 </tr>
+</tbody>
+</table>
 </p>
 ## 🎥 Visualization
 <p align="center">
+  <img src="https://raw.githubusercontent.com/Tencent/HunyuanDiT/main/asset/long text understanding.png"  height=310>
 </p>
 * **Multi-turn Text2Image Generation**
+https://github.com/Tencent/tencent.github.io/assets/27557933/94b4dcc3-104d-44e1-8bb2-dc55108763d1
 ---
 # 🔗 BibTeX
+If you find [Hunyuan-DiT](https://arxiv.org/abs/2405.08748) or [DialogGen](https://arxiv.org/abs/2403.08857) useful for your research and applications, please cite using this BibTeX:
 ```BibTeX
+@misc{li2024hunyuandit,
+      title={Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding},
+      author={Zhimin Li and Jianwei Zhang and Qin Lin and Jiangfeng Xiong and Yanxin Long and Xinchi Deng and Yingfang Zhang and Xingchao Liu and Minbin Huang and Zedong Xiao and Dayou Chen and Jiajun He and Jiahao Li and Wenyue Li and Chen Zhang and Rongwei Quan and Jianxiang Lu and Jiabin Huang and Xiaoyan Yuan and Xiaoxiao Zheng and Yixuan Li and Jihong Zhang and Chao Zhang and Meng Chen and Jie Liu and Zheng Fang and Weiyan Wang and Jinbao Xue and Yangyu Tao and Jianchen Zhu and Kai Liu and Sihuan Lin and Yifu Sun and Yun Li and Dongdong Wang and Mingtao Chen and Zhichao Hu and Xiao Xiao and Yan Chen and Yuhong Liu and Wei Liu and Di Wang and Yong Yang and Jie Jiang and Qinglin Lu},
       year={2024},
+      eprint={2405.08748},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+@article{huang2024dialoggen,
+  title={DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation},
+  author={Huang, Minbin and Long, Yanxin and Deng, Xinchi and Chu, Ruihang and Xiong, Jiangfeng and Liang, Xiaodan and Cheng, Hong and Lu, Qinglin and Liu, Wei},
+  journal={arXiv preprint arXiv:2403.08857},
+  year={2024}
 }
 ```