etri-vilab
/

koala-700m

StableDiffusionXLPipeline

Inference Endpoints

Model card Files Files and versions Community

ywlee88 commited on Jan 12, 2024

Commit

ad5f604

·

1 Parent(s): f9e966d

update README

Files changed (1) hide show

README.md +16 -3

README.md CHANGED Viewed

@@ -8,9 +8,6 @@ tags:
   <img src="https://dl.dropboxusercontent.com/scl/fi/yosvi68jvyarbvymxc4hm/github_logo.png?rlkey=r9ouwcd7cqxjbvio43q9b3djd&dl=1" width="1024px" />
 </div>
-> **[KOALA: Self-Attention Matters in Knowledge Distillation of Latent Diffusion Models for Memory-Efficient and Fast Image Synthesis](http://arxiv.org/abs/2312.04005)**<br>
-> [Youngwan Lee](https://github.com/youngwanLEE)<sup>1,2</sup>, [Kwanyong Park](https://pkyong95.github.io/)<sup>1</sup>, [Yoorhim Cho](https://ofzlo.github.io/)<sup>3</sup>, [Young-Ju Lee](https://scholar.google.com/citations?user=6goOQh8AAAAJ&hl=en)<sup>1</sup>, [Sung Ju Hwang](http://www.sungjuhwang.com/)<sup>2,4</sup> <br>
-> <sup>1</sup>ETRI <sup>2</sup>KAIST, <sup>3</sup>SMWU, <sup>4</sup>DeepAuto.ai
 <div style="display:flex;justify-content: center">
@@ -57,6 +54,20 @@ There are two two types of compressed U-Net, KOALA-1B and KOALA-700M, which are
   <img src="https://dl.dropboxusercontent.com/scl/fi/5ydeywgiyt1d3njw63dpk/arch.png?rlkey=1p6imbjs4lkmfpcxy153i1a2t&dl=1" width="1024px" />
 </div>
 ## Latency and memory usage comparison on different GPUs
@@ -85,6 +96,8 @@ We measure the inference time of SDM-v2.0 with 768x768 resolution and the other
 - Resources for more information: Check out [KOALA report on arXiv](https://arxiv.org/abs/2312.04005) and [project page](https://youngwanlee.github.io/KOALA/).
 ## Usage with 🤗[Diffusers library](https://github.com/huggingface/diffusers)
 The inference code with denoising step 25
 ```python

   <img src="https://dl.dropboxusercontent.com/scl/fi/yosvi68jvyarbvymxc4hm/github_logo.png?rlkey=r9ouwcd7cqxjbvio43q9b3djd&dl=1" width="1024px" />
 </div>
 <div style="display:flex;justify-content: center">
   <img src="https://dl.dropboxusercontent.com/scl/fi/5ydeywgiyt1d3njw63dpk/arch.png?rlkey=1p6imbjs4lkmfpcxy153i1a2t&dl=1" width="1024px" />
 </div>
+### U-Net comparison
+| U-Net | SDM-v2.0 | SDXL-Base-1.0 | KOALA-1B | KOALA-700M |
+|-------|----------|-----------|-----------|-------------|
+| Param.       | 865M  | 2,567M | 1,161M | 782M  |
+| CKPT size    | 3.46GB | 10.3GB | 4.4GB  | 3.0GB |
+| Tx blocks    | [1, 1, 1, 1] | [0, 2, 10] | [0, 2, 6] | [0, 2, 5] |
+| Mid block    | ✓ | ✓ | ✓ | ✗ |
+| Latency      | 1.131s | 3.133s | 1.604s | 1.257s |
+- Tx menans transformer block and CKPT means the trained checkpoint file.
+- We measured latency with FP16-precision, and 25 denoising steps in NVIDIA 4090 GPU (24GB).
+- SDM-v2.0 uses 768x768 resolution, while SDXL and KOALA models uses 1024x1024 resolution.
 ## Latency and memory usage comparison on different GPUs
 - Resources for more information: Check out [KOALA report on arXiv](https://arxiv.org/abs/2312.04005) and [project page](https://youngwanlee.github.io/KOALA/).
 ## Usage with 🤗[Diffusers library](https://github.com/huggingface/diffusers)
 The inference code with denoising step 25
 ```python