hubertsiuzdak commited on
Commit
46a8baa
·
verified ·
1 Parent(s): f44fd85

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md CHANGED
@@ -1,3 +1,63 @@
1
  ---
2
  license: mit
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ tags:
4
+ - audio
5
  ---
6
+
7
+ # SNAC 🍿
8
+
9
+ Multi-**S**cale **N**eural **A**udio **C**odec (SNAC) compressess audio into discrete codes at a low bitrate.
10
+
11
+ 👉 This model was primarily trained on speech data, and its recommended use case is speech synthesis. See below for other pretrained models.
12
+
13
+ 🔗 GitHub repository: https://github.com/hubertsiuzdak/snac/
14
+
15
+ ## Overview
16
+
17
+ SNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC. However, SNAC introduces a simple change where coarse tokens are sampled less frequently,
18
+ covering a broader time span.
19
+
20
+ This model compresses 24 kHz audio into discrete codes at a 0.98 kbps bitrate. It uses 3 RVQ levels with token rates of 12, 23, and
21
+ 47 Hz.
22
+
23
+ ## Pretrained models
24
+
25
+ Currently, all models support only single audio channel (mono).
26
+
27
+ | Model | Bitrate | Sample Rate | Params | Recommended use case |
28
+ |-----------------------------------------------------------------------------|-----------|-------------|--------|--------------------------|
29
+ | hubertsiuzdak/snac_24khz (this model) | 0.98 kbps | 24 kHz | 19.8 M | 🗣️ Speech |
30
+ | [hubertsiuzdak/snac_32khz](https://huggingface.co/hubertsiuzdak/snac_32khz) | 1.9 kbps | 32 kHz | 54.5 M | 🎸 Music / Sound Effects |
31
+ | [hubertsiuzdak/snac_44khz](https://huggingface.co/hubertsiuzdak/snac_44khz) | 2.6 kbps | 44 kHz | 54.5 M | 🎸 Music / Sound Effects |
32
+
33
+ ## Usage
34
+
35
+ Install it using:
36
+
37
+ ```bash
38
+ pip install snac
39
+ ```
40
+ To encode (and reconstruct) audio with SNAC in Python, use the following code:
41
+
42
+ ```python
43
+ import torch
44
+ from snac import SNAC
45
+
46
+ model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz").eval().cuda()
47
+ audio = torch.randn(1, 1, 24000).cuda() # B, 1, T
48
+
49
+ with torch.inference_mode():
50
+ audio_hat, _, codes, _, _ = model(audio)
51
+ ```
52
+
53
+ ⚠️ Note that `codes` is a list of token sequences of variable lengths, each corresponding to a different temporal
54
+ resolution.
55
+
56
+ ```
57
+ >>> [code.shape[1] for code in codes]
58
+ [12, 24, 48]
59
+ ```
60
+
61
+ ## Acknowledgements
62
+
63
+ Module definitions are adapted from the [Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec).