English
jdeschena commited on
Commit
3144d4a
1 Parent(s): 6a0105d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md CHANGED
@@ -9,6 +9,11 @@ metrics:
9
  - mauve
10
  ---
11
 
 
 
 
 
 
12
  ## Using SDTT
13
  - We released 3 groups of models:
14
  1. The **baseline students** distilled with the `kld`, `mse` and `tvd` objectives, distilled from a model trained for 1M steps.
 
9
  - mauve
10
  ---
11
 
12
+ # Self-Distillation Through Time (SDTT)
13
+ SDTT is a distillation method for diffusion language models. Recent diffusion language models such as [SEDD](https://huggingface.co/louaaron/sedd-small) or [MDLM](https://huggingface.co/kuleshov-group/mdlm-owt) achieve great results.
14
+ However, because they cannot use KV-caching (non-causal architecture), it is slow to sample from them. Therefore, we devise a novel distillation method to reduce the inference latency of discrete diffusion models.
15
+ After distillation, we can sample up to 8x faster than GPT-2 (that uses KV-caching). Find more details below and on [our GitHub repo](https://github.com/jdeschena/sdtt).
16
+
17
  ## Using SDTT
18
  - We released 3 groups of models:
19
  1. The **baseline students** distilled with the `kld`, `mse` and `tvd` objectives, distilled from a model trained for 1M steps.