Spaces:
Runtime error
Runtime error
doc: reference dalle playground
Browse files
README.md
CHANGED
@@ -19,21 +19,25 @@ _Generate images from a text prompt_
|
|
19 |
|
20 |
Our logo was generated with DALL·E mini using the prompt "logo of an armchair in the shape of an avocado".
|
21 |
|
22 |
-
|
23 |
|
24 |
-
|
25 |
|
26 |
-
|
|
|
|
|
27 |
|
28 |
-
|
29 |
|
30 |
-
|
|
|
|
|
31 |
|
32 |
-
[
|
33 |
|
34 |
## Contributing
|
35 |
|
36 |
-
Join the community on the [
|
37 |
Any contribution is welcome, from reporting issues to proposing fixes/improvements or testing the model with cool prompts!
|
38 |
|
39 |
## Development
|
@@ -45,14 +49,6 @@ For inference only, use `pip install git+https://github.com/borisdayma/dalle-min
|
|
45 |
For development, clone the repo and use `pip install -e ".[dev]"`.
|
46 |
Before making a PR, check style with `make style`.
|
47 |
|
48 |
-
### Image Encoder
|
49 |
-
|
50 |
-
We use a VQGAN from [taming-transformers](https://github.com/CompVis/taming-transformers), which can also be fine-tuned.
|
51 |
-
|
52 |
-
Use [patil-suraj/vqgan-jax](https://github.com/patil-suraj/vqgan-jax) if you want to convert a checkpoint to JAX (does not support Gumbel).
|
53 |
-
|
54 |
-
Any image encoder that turns an image into a fixed sequence of tokens can be used.
|
55 |
-
|
56 |
### Training of DALL·E mini
|
57 |
|
58 |
Use [`tools/train/train.py`](tools/train/train.py).
|
@@ -65,8 +61,8 @@ You can also adjust the [sweep configuration file](https://docs.wandb.ai/guides/
|
|
65 |
|
66 |
Trained models are on 🤗 Model Hub:
|
67 |
|
68 |
-
|
69 |
-
|
70 |
|
71 |
### Where does the logo come from?
|
72 |
|
@@ -74,29 +70,29 @@ The "armchair in the shape of an avocado" was used by OpenAI when releasing DALL
|
|
74 |
|
75 |
## Acknowledgements
|
76 |
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
|
81 |
## Authors & Contributors
|
82 |
|
83 |
DALL·E mini was initially developed by:
|
84 |
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
|
94 |
Many thanks to the people who helped make it better:
|
95 |
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
|
101 |
## Citing DALL·E mini
|
102 |
|
@@ -121,13 +117,13 @@ Image encoder from "[Taming Transformers for High-Resolution Image Synthesis](ht
|
|
121 |
|
122 |
Sequence to sequence model based on "[BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461v1)" with implementation of a few variants:
|
123 |
|
124 |
-
|
125 |
-
|
126 |
-
|
127 |
-
|
128 |
-
|
129 |
-
|
130 |
-
|
131 |
|
132 |
Main optimizer (Distributed Shampoo) from "[Scalable Second Order Optimization for Deep Learning](https://arxiv.org/abs/2002.09018)".
|
133 |
|
|
|
19 |
|
20 |
Our logo was generated with DALL·E mini using the prompt "logo of an armchair in the shape of an avocado".
|
21 |
|
22 |
+
## How to use it?
|
23 |
|
24 |
+
There are several ways to use DALL·E mini to create your own images:
|
25 |
|
26 |
+
* use [the official DALL·E Mini demo](https://huggingface.co/spaces/dalle-mini/dalle-mini)
|
27 |
+
|
28 |
+
* experiment with the pipeline step by step through our [`inference pipeline notebook`](tools/inference/inference_pipeline.ipynb)
|
29 |
|
30 |
+
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/borisdayma/dalle-mini/blob/main/tools/inference/inference_pipeline.ipynb)
|
31 |
|
32 |
+
* spin off your own app with [DALL-E Playground repository](https://github.com/saharmor/dalle-playground) (thanks [Sahar](https://twitter.com/theaievangelist))
|
33 |
+
|
34 |
+
## How does it work?
|
35 |
|
36 |
+
Refer to [our report](https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA).
|
37 |
|
38 |
## Contributing
|
39 |
|
40 |
+
Join the community on the [LAION Discord](https://discord.gg/xBPBXfcFHd).
|
41 |
Any contribution is welcome, from reporting issues to proposing fixes/improvements or testing the model with cool prompts!
|
42 |
|
43 |
## Development
|
|
|
49 |
For development, clone the repo and use `pip install -e ".[dev]"`.
|
50 |
Before making a PR, check style with `make style`.
|
51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
### Training of DALL·E mini
|
53 |
|
54 |
Use [`tools/train/train.py`](tools/train/train.py).
|
|
|
61 |
|
62 |
Trained models are on 🤗 Model Hub:
|
63 |
|
64 |
+
* [VQGAN-f16-16384](https://huggingface.co/dalle-mini/vqgan_imagenet_f16_16384) for encoding/decoding images
|
65 |
+
* [DALL·E mini](https://huggingface.co/flax-community/dalle-mini) for generating images from a text prompt
|
66 |
|
67 |
### Where does the logo come from?
|
68 |
|
|
|
70 |
|
71 |
## Acknowledgements
|
72 |
|
73 |
+
* 🤗 Hugging Face for organizing [the FLAX/JAX community week](https://github.com/huggingface/transformers/tree/master/examples/research_projects/jax-projects)
|
74 |
+
* Google [TPU Research Cloud (TRC) program](https://sites.research.google/trc/) for providing computing resources
|
75 |
+
* [Weights & Biases](https://wandb.com/) for providing the infrastructure for experiment tracking and model management
|
76 |
|
77 |
## Authors & Contributors
|
78 |
|
79 |
DALL·E mini was initially developed by:
|
80 |
|
81 |
+
* [Boris Dayma](https://github.com/borisdayma)
|
82 |
+
* [Suraj Patil](https://github.com/patil-suraj)
|
83 |
+
* [Pedro Cuenca](https://github.com/pcuenca)
|
84 |
+
* [Khalid Saifullah](https://github.com/khalidsaifullaah)
|
85 |
+
* [Tanishq Abraham](https://github.com/tmabraham)
|
86 |
+
* [Phúc Lê Khắc](https://github.com/lkhphuc)
|
87 |
+
* [Luke Melas](https://github.com/lukemelas)
|
88 |
+
* [Ritobrata Ghosh](https://github.com/ghosh-r)
|
89 |
|
90 |
Many thanks to the people who helped make it better:
|
91 |
|
92 |
+
* the [DALLE-Pytorch](https://discord.gg/xBPBXfcFHd) and [EleutherAI](https://www.eleuther.ai/) communities for testing and exchanging cool ideas
|
93 |
+
* [Rohan Anil](https://github.com/rohan-anil) for adding Distributed Shampoo optimizer
|
94 |
+
* [Phil Wang](https://github.com/lucidrains) has provided a lot of cool implementations of transformer variants and gives interesting insights with [x-transformers](https://github.com/lucidrains/x-transformers)
|
95 |
+
* [Katherine Crowson](https://github.com/crowsonkb) for [super conditioning](https://twitter.com/RiversHaveWings/status/1478093658716966912)
|
96 |
|
97 |
## Citing DALL·E mini
|
98 |
|
|
|
117 |
|
118 |
Sequence to sequence model based on "[BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461v1)" with implementation of a few variants:
|
119 |
|
120 |
+
* "[GLU Variants Improve Transformer](https://arxiv.org/abs/2002.05202)"
|
121 |
+
* "[Deepnet: Scaling Transformers to 1,000 Layers](https://arxiv.org/abs/2203.00555)"
|
122 |
+
* "[NormFormer: Improved Transformer Pretraining with Extra Normalization](https://arxiv.org/abs/2110.09456)"
|
123 |
+
* "[Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030)"
|
124 |
+
* "[CogView: Mastering Text-to-Image Generation via Transformers](https://arxiv.org/abs/2105.13290v2)"
|
125 |
+
* "[Root Mean Square Layer Normalization](https://arxiv.org/abs/1910.07467)"
|
126 |
+
* "[Sinkformers: Transformers with Doubly Stochastic Attention](https://arxiv.org/abs/2110.11773)"
|
127 |
|
128 |
Main optimizer (Distributed Shampoo) from "[Scalable Second Order Optimization for Deep Learning](https://arxiv.org/abs/2002.09018)".
|
129 |
|