davidmezzetti commited on
Commit
bc79a8f
·
1 Parent(s): 9ff87ca

Initial version

Browse files
Files changed (4) hide show
  1. .gitattributes +1 -0
  2. README.md +81 -3
  3. model.onnx +3 -0
  4. voices.json +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ voices.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,81 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - audio
4
+ - text-to-speech
5
+ - onnx
6
+ base_model:
7
+ - hexgrad/Kokoro-82M
8
+ inference: false
9
+ language: en
10
+ license: apache-2.0
11
+ library_name: txtai
12
+ ---
13
+
14
+ # Kokoro fp16 Model for ONNX
15
+
16
+ [Kokoro 82M](https://huggingface.co/hexgrad/Kokoro-82M) export to ONNX as fp16. This model is from [this GitHub repo](https://github.com/taylorchu/kokoro-onnx/releases/). The voices file is from [this repository](https://github.com/thewh1teagle/kokoro-onnx/releases/tag/model-files).
17
+
18
+ ## Usage with txtai
19
+
20
+ [txtai](https://github.com/neuml/txtai) has a built in Text to Speech (TTS) pipeline that makes using this model easy.
21
+
22
+ _Note: This requires txtai >= 8.3.0. Install from GitHub until that release._
23
+
24
+ ```python
25
+ import soundfile as sf
26
+
27
+ from txtai.pipeline import TextToSpeech
28
+
29
+ # Build pipeline
30
+ tts = TextToSpeech("NeuML/kokoro-fp16-onnx")
31
+
32
+ # Generate speech
33
+ speech, rate = tts("Say something here")
34
+
35
+ # Write to file
36
+ sf.write("out.wav", speech, rate)
37
+ ```
38
+
39
+ ## Usage with ONNX
40
+
41
+ This model can also be run directly with ONNX provided the input text is tokenized. Tokenization can be done with [ttstokenizer](https://github.com/neuml/ttstokenizer). `ttstokenizer` is a permissively licensed library with no external dependencies (such as espeak).
42
+
43
+ Note that the txtai pipeline has additional functionality such as batching large inputs together that would need to be duplicated with this method.
44
+
45
+ ```python
46
+ import json
47
+ import numpy as np
48
+ import onnxruntime
49
+ import soundfile as sf
50
+
51
+ from ttstokenizer import IPATokenizer
52
+
53
+ # This example assumes the files have been downloaded locally
54
+ with open("kokoro-fp16-onnx/voices.json", "r", encoding="utf-8") as f:
55
+ voices = json.load(f)
56
+
57
+ # Create model
58
+ model = onnxruntime.InferenceSession(
59
+ "kokoro-fp16-onnx/model.onnx",
60
+ providers=["CPUExecutionProvider"]
61
+ )
62
+
63
+ # Create tokenizer
64
+ tokenizer = IPATokenizer()
65
+
66
+ # Tokenize inputs
67
+ inputs = tokenizer("Say something here")
68
+
69
+ # Get speaker array
70
+ speaker = np.array(self.voices["af"], dtype=np.float32)
71
+
72
+ # Generate speech
73
+ outputs = model.run(None, {
74
+ "tokens": [[0, *inputs, 0]],
75
+ "style": speaker[len(inputs)],
76
+ "speed": np.ones(1, dtype=np.float32) * 1.0
77
+ })
78
+
79
+ # Write to file
80
+ sf.write("out.wav", outputs[0], 24000)
81
+ ```
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65330db8adaedb57562c5e1cb7fc2e5afae2b27d67003564ff2170f1c48273ee
3
+ size 177870108
voices.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc24670e8333cb30990726c5d99e991afc14645139d1a9d2d1858d4fba08df05
3
+ size 54060439