boris commited on
Commit
648305a
1 Parent(s): 07a6f9a

doc: reference dalle playground

Browse files
Files changed (1) hide show
  1. README.md +35 -39
README.md CHANGED
@@ -19,21 +19,25 @@ _Generate images from a text prompt_
19
 
20
  Our logo was generated with DALL·E mini using the prompt "logo of an armchair in the shape of an avocado".
21
 
22
- You can create your own pictures with [the demo](https://huggingface.co/spaces/dalle-mini/dalle-mini).
23
 
24
- ## How does it work?
25
 
26
- Refer to [our report](https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA).
 
 
27
 
28
- ## Inference Pipeline
29
 
30
- To generate sample predictions and understand the inference pipeline step by step, refer to [`tools/inference/inference_pipeline.ipynb`](tools/inference/inference_pipeline.ipynb).
 
 
31
 
32
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/borisdayma/dalle-mini/blob/main/tools/inference/inference_pipeline.ipynb)
33
 
34
  ## Contributing
35
 
36
- Join the community on the [DALLE-Pytorch Discord](https://discord.gg/xBPBXfcFHd).
37
  Any contribution is welcome, from reporting issues to proposing fixes/improvements or testing the model with cool prompts!
38
 
39
  ## Development
@@ -45,14 +49,6 @@ For inference only, use `pip install git+https://github.com/borisdayma/dalle-min
45
  For development, clone the repo and use `pip install -e ".[dev]"`.
46
  Before making a PR, check style with `make style`.
47
 
48
- ### Image Encoder
49
-
50
- We use a VQGAN from [taming-transformers](https://github.com/CompVis/taming-transformers), which can also be fine-tuned.
51
-
52
- Use [patil-suraj/vqgan-jax](https://github.com/patil-suraj/vqgan-jax) if you want to convert a checkpoint to JAX (does not support Gumbel).
53
-
54
- Any image encoder that turns an image into a fixed sequence of tokens can be used.
55
-
56
  ### Training of DALL·E mini
57
 
58
  Use [`tools/train/train.py`](tools/train/train.py).
@@ -65,8 +61,8 @@ You can also adjust the [sweep configuration file](https://docs.wandb.ai/guides/
65
 
66
  Trained models are on 🤗 Model Hub:
67
 
68
- - [VQGAN-f16-16384](https://huggingface.co/dalle-mini/vqgan_imagenet_f16_16384) for encoding/decoding images
69
- - [DALL·E mini](https://huggingface.co/flax-community/dalle-mini) for generating images from a text prompt
70
 
71
  ### Where does the logo come from?
72
 
@@ -74,29 +70,29 @@ The "armchair in the shape of an avocado" was used by OpenAI when releasing DALL
74
 
75
  ## Acknowledgements
76
 
77
- - 🤗 Hugging Face for organizing [the FLAX/JAX community week](https://github.com/huggingface/transformers/tree/master/examples/research_projects/jax-projects)
78
- - Google [TPU Research Cloud (TRC) program](https://sites.research.google/trc/) for providing computing resources
79
- - [Weights & Biases](https://wandb.com/) for providing the infrastructure for experiment tracking and model management
80
 
81
  ## Authors & Contributors
82
 
83
  DALL·E mini was initially developed by:
84
 
85
- - [Boris Dayma](https://github.com/borisdayma)
86
- - [Suraj Patil](https://github.com/patil-suraj)
87
- - [Pedro Cuenca](https://github.com/pcuenca)
88
- - [Khalid Saifullah](https://github.com/khalidsaifullaah)
89
- - [Tanishq Abraham](https://github.com/tmabraham)
90
- - [Phúc Lê Khắc](https://github.com/lkhphuc)
91
- - [Luke Melas](https://github.com/lukemelas)
92
- - [Ritobrata Ghosh](https://github.com/ghosh-r)
93
 
94
  Many thanks to the people who helped make it better:
95
 
96
- - the [DALLE-Pytorch](https://discord.gg/xBPBXfcFHd) and [EleutherAI](https://www.eleuther.ai/) communities for testing and exchanging cool ideas
97
- - [Rohan Anil](https://github.com/rohan-anil) for adding Distributed Shampoo optimizer
98
- - [Phil Wang](https://github.com/lucidrains) has provided a lot of cool implementations of transformer variants and gives interesting insights with [x-transformers](https://github.com/lucidrains/x-transformers)
99
- - [Katherine Crowson](https://github.com/crowsonkb) for [super conditioning](https://twitter.com/RiversHaveWings/status/1478093658716966912)
100
 
101
  ## Citing DALL·E mini
102
 
@@ -121,13 +117,13 @@ Image encoder from "[Taming Transformers for High-Resolution Image Synthesis](ht
121
 
122
  Sequence to sequence model based on "[BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461v1)" with implementation of a few variants:
123
 
124
- - "[GLU Variants Improve Transformer](https://arxiv.org/abs/2002.05202)"
125
- - "[Deepnet: Scaling Transformers to 1,000 Layers](https://arxiv.org/abs/2203.00555)"
126
- - "[NormFormer: Improved Transformer Pretraining with Extra Normalization](https://arxiv.org/abs/2110.09456)"
127
- - "[Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030)"
128
- - "[CogView: Mastering Text-to-Image Generation via Transformers](https://arxiv.org/abs/2105.13290v2)"
129
- - "[Root Mean Square Layer Normalization](https://arxiv.org/abs/1910.07467)"
130
- - "[Sinkformers: Transformers with Doubly Stochastic Attention](https://arxiv.org/abs/2110.11773)"
131
 
132
  Main optimizer (Distributed Shampoo) from "[Scalable Second Order Optimization for Deep Learning](https://arxiv.org/abs/2002.09018)".
133
 
 
19
 
20
  Our logo was generated with DALL·E mini using the prompt "logo of an armchair in the shape of an avocado".
21
 
22
+ ## How to use it?
23
 
24
+ There are several ways to use DALL·E mini to create your own images:
25
 
26
+ * use [the official DALL·E Mini demo](https://huggingface.co/spaces/dalle-mini/dalle-mini)
27
+
28
+ * experiment with the pipeline step by step through our [`inference pipeline notebook`](tools/inference/inference_pipeline.ipynb)
29
 
30
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/borisdayma/dalle-mini/blob/main/tools/inference/inference_pipeline.ipynb)
31
 
32
+ * spin off your own app with [DALL-E Playground repository](https://github.com/saharmor/dalle-playground) (thanks [Sahar](https://twitter.com/theaievangelist))
33
+
34
+ ## How does it work?
35
 
36
+ Refer to [our report](https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA).
37
 
38
  ## Contributing
39
 
40
+ Join the community on the [LAION Discord](https://discord.gg/xBPBXfcFHd).
41
  Any contribution is welcome, from reporting issues to proposing fixes/improvements or testing the model with cool prompts!
42
 
43
  ## Development
 
49
  For development, clone the repo and use `pip install -e ".[dev]"`.
50
  Before making a PR, check style with `make style`.
51
 
 
 
 
 
 
 
 
 
52
  ### Training of DALL·E mini
53
 
54
  Use [`tools/train/train.py`](tools/train/train.py).
 
61
 
62
  Trained models are on 🤗 Model Hub:
63
 
64
+ * [VQGAN-f16-16384](https://huggingface.co/dalle-mini/vqgan_imagenet_f16_16384) for encoding/decoding images
65
+ * [DALL·E mini](https://huggingface.co/flax-community/dalle-mini) for generating images from a text prompt
66
 
67
  ### Where does the logo come from?
68
 
 
70
 
71
  ## Acknowledgements
72
 
73
+ * 🤗 Hugging Face for organizing [the FLAX/JAX community week](https://github.com/huggingface/transformers/tree/master/examples/research_projects/jax-projects)
74
+ * Google [TPU Research Cloud (TRC) program](https://sites.research.google/trc/) for providing computing resources
75
+ * [Weights & Biases](https://wandb.com/) for providing the infrastructure for experiment tracking and model management
76
 
77
  ## Authors & Contributors
78
 
79
  DALL·E mini was initially developed by:
80
 
81
+ * [Boris Dayma](https://github.com/borisdayma)
82
+ * [Suraj Patil](https://github.com/patil-suraj)
83
+ * [Pedro Cuenca](https://github.com/pcuenca)
84
+ * [Khalid Saifullah](https://github.com/khalidsaifullaah)
85
+ * [Tanishq Abraham](https://github.com/tmabraham)
86
+ * [Phúc Lê Khắc](https://github.com/lkhphuc)
87
+ * [Luke Melas](https://github.com/lukemelas)
88
+ * [Ritobrata Ghosh](https://github.com/ghosh-r)
89
 
90
  Many thanks to the people who helped make it better:
91
 
92
+ * the [DALLE-Pytorch](https://discord.gg/xBPBXfcFHd) and [EleutherAI](https://www.eleuther.ai/) communities for testing and exchanging cool ideas
93
+ * [Rohan Anil](https://github.com/rohan-anil) for adding Distributed Shampoo optimizer
94
+ * [Phil Wang](https://github.com/lucidrains) has provided a lot of cool implementations of transformer variants and gives interesting insights with [x-transformers](https://github.com/lucidrains/x-transformers)
95
+ * [Katherine Crowson](https://github.com/crowsonkb) for [super conditioning](https://twitter.com/RiversHaveWings/status/1478093658716966912)
96
 
97
  ## Citing DALL·E mini
98
 
 
117
 
118
  Sequence to sequence model based on "[BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461v1)" with implementation of a few variants:
119
 
120
+ * "[GLU Variants Improve Transformer](https://arxiv.org/abs/2002.05202)"
121
+ * "[Deepnet: Scaling Transformers to 1,000 Layers](https://arxiv.org/abs/2203.00555)"
122
+ * "[NormFormer: Improved Transformer Pretraining with Extra Normalization](https://arxiv.org/abs/2110.09456)"
123
+ * "[Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030)"
124
+ * "[CogView: Mastering Text-to-Image Generation via Transformers](https://arxiv.org/abs/2105.13290v2)"
125
+ * "[Root Mean Square Layer Normalization](https://arxiv.org/abs/1910.07467)"
126
+ * "[Sinkformers: Transformers with Doubly Stochastic Attention](https://arxiv.org/abs/2110.11773)"
127
 
128
  Main optimizer (Distributed Shampoo) from "[Scalable Second Order Optimization for Deep Learning](https://arxiv.org/abs/2002.09018)".
129