update README
Browse files
README.md
CHANGED
@@ -8,9 +8,6 @@ tags:
|
|
8 |
<img src="https://dl.dropboxusercontent.com/scl/fi/yosvi68jvyarbvymxc4hm/github_logo.png?rlkey=r9ouwcd7cqxjbvio43q9b3djd&dl=1" width="1024px" />
|
9 |
</div>
|
10 |
|
11 |
-
> **[KOALA: Self-Attention Matters in Knowledge Distillation of Latent Diffusion Models for Memory-Efficient and Fast Image Synthesis](http://arxiv.org/abs/2312.04005)**<br>
|
12 |
-
> [Youngwan Lee](https://github.com/youngwanLEE)<sup>1,2</sup>, [Kwanyong Park](https://pkyong95.github.io/)<sup>1</sup>, [Yoorhim Cho](https://ofzlo.github.io/)<sup>3</sup>, [Young-Ju Lee](https://scholar.google.com/citations?user=6goOQh8AAAAJ&hl=en)<sup>1</sup>, [Sung Ju Hwang](http://www.sungjuhwang.com/)<sup>2,4</sup> <br>
|
13 |
-
> <sup>1</sup>ETRI <sup>2</sup>KAIST, <sup>3</sup>SMWU, <sup>4</sup>DeepAuto.ai
|
14 |
|
15 |
|
16 |
<div style="display:flex;justify-content: center">
|
@@ -57,6 +54,20 @@ There are two two types of compressed U-Net, KOALA-1B and KOALA-700M, which are
|
|
57 |
<img src="https://dl.dropboxusercontent.com/scl/fi/5ydeywgiyt1d3njw63dpk/arch.png?rlkey=1p6imbjs4lkmfpcxy153i1a2t&dl=1" width="1024px" />
|
58 |
</div>
|
59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
60 |
|
61 |
## Latency and memory usage comparison on different GPUs
|
62 |
|
@@ -85,6 +96,8 @@ We measure the inference time of SDM-v2.0 with 768x768 resolution and the other
|
|
85 |
- Resources for more information: Check out [KOALA report on arXiv](https://arxiv.org/abs/2312.04005) and [project page](https://youngwanlee.github.io/KOALA/).
|
86 |
|
87 |
|
|
|
|
|
88 |
## Usage with 🤗[Diffusers library](https://github.com/huggingface/diffusers)
|
89 |
The inference code with denoising step 25
|
90 |
```python
|
|
|
8 |
<img src="https://dl.dropboxusercontent.com/scl/fi/yosvi68jvyarbvymxc4hm/github_logo.png?rlkey=r9ouwcd7cqxjbvio43q9b3djd&dl=1" width="1024px" />
|
9 |
</div>
|
10 |
|
|
|
|
|
|
|
11 |
|
12 |
|
13 |
<div style="display:flex;justify-content: center">
|
|
|
54 |
<img src="https://dl.dropboxusercontent.com/scl/fi/5ydeywgiyt1d3njw63dpk/arch.png?rlkey=1p6imbjs4lkmfpcxy153i1a2t&dl=1" width="1024px" />
|
55 |
</div>
|
56 |
|
57 |
+
### U-Net comparison
|
58 |
+
|
59 |
+
| U-Net | SDM-v2.0 | SDXL-Base-1.0 | KOALA-1B | KOALA-700M |
|
60 |
+
|-------|----------|-----------|-----------|-------------|
|
61 |
+
| Param. | 865M | 2,567M | 1,161M | 782M |
|
62 |
+
| CKPT size | 3.46GB | 10.3GB | 4.4GB | 3.0GB |
|
63 |
+
| Tx blocks | [1, 1, 1, 1] | [0, 2, 10] | [0, 2, 6] | [0, 2, 5] |
|
64 |
+
| Mid block | ✓ | ✓ | ✓ | ✗ |
|
65 |
+
| Latency | 1.131s | 3.133s | 1.604s | 1.257s |
|
66 |
+
|
67 |
+
- Tx menans transformer block and CKPT means the trained checkpoint file.
|
68 |
+
- We measured latency with FP16-precision, and 25 denoising steps in NVIDIA 4090 GPU (24GB).
|
69 |
+
- SDM-v2.0 uses 768x768 resolution, while SDXL and KOALA models uses 1024x1024 resolution.
|
70 |
+
|
71 |
|
72 |
## Latency and memory usage comparison on different GPUs
|
73 |
|
|
|
96 |
- Resources for more information: Check out [KOALA report on arXiv](https://arxiv.org/abs/2312.04005) and [project page](https://youngwanlee.github.io/KOALA/).
|
97 |
|
98 |
|
99 |
+
|
100 |
+
|
101 |
## Usage with 🤗[Diffusers library](https://github.com/huggingface/diffusers)
|
102 |
The inference code with denoising step 25
|
103 |
```python
|