Update README.md
Browse files
README.md
CHANGED
@@ -9,6 +9,11 @@ metrics:
|
|
9 |
- mauve
|
10 |
---
|
11 |
|
|
|
|
|
|
|
|
|
|
|
12 |
## Using SDTT
|
13 |
- We released 3 groups of models:
|
14 |
1. The **baseline students** distilled with the `kld`, `mse` and `tvd` objectives, distilled from a model trained for 1M steps.
|
|
|
9 |
- mauve
|
10 |
---
|
11 |
|
12 |
+
# Self-Distillation Through Time (SDTT)
|
13 |
+
SDTT is a distillation method for diffusion language models. Recent diffusion language models such as [SEDD](https://huggingface.co/louaaron/sedd-small) or [MDLM](https://huggingface.co/kuleshov-group/mdlm-owt) achieve great results.
|
14 |
+
However, because they cannot use KV-caching (non-causal architecture), it is slow to sample from them. Therefore, we devise a novel distillation method to reduce the inference latency of discrete diffusion models.
|
15 |
+
After distillation, we can sample up to 8x faster than GPT-2 (that uses KV-caching). Find more details below and on [our GitHub repo](https://github.com/jdeschena/sdtt).
|
16 |
+
|
17 |
## Using SDTT
|
18 |
- We released 3 groups of models:
|
19 |
1. The **baseline students** distilled with the `kld`, `mse` and `tvd` objectives, distilled from a model trained for 1M steps.
|