--- tags: - audio - text-to-speech - onnx base_model: - hexgrad/Kokoro-82M inference: false language: en license: apache-2.0 library_name: txtai --- # Kokoro int8 Model for ONNX [Kokoro 82M](https://huggingface.co./hexgrad/Kokoro-82M) export to ONNX as int8. This model is from [this GitHub repo](https://github.com/taylorchu/kokoro-onnx/releases/). The voices file is from [this repository](https://github.com/thewh1teagle/kokoro-onnx/releases/tag/model-files). ## Usage with txtai [txtai](https://github.com/neuml/txtai) has a built in Text to Speech (TTS) pipeline that makes using this model easy. _Note: This requires txtai >= 8.3.0. Install from GitHub until that release._ ```python import soundfile as sf from txtai.pipeline import TextToSpeech # Build pipeline tts = TextToSpeech("NeuML/kokoro-int8-onnx") # Generate speech speech, rate = tts("Say something here") # Write to file sf.write("out.wav", speech, rate) ``` ## Usage with ONNX This model can also be run directly with ONNX provided the input text is tokenized. Tokenization can be done with [ttstokenizer](https://github.com/neuml/ttstokenizer). `ttstokenizer` is a permissively licensed library with no external dependencies (such as espeak). Note that the txtai pipeline has additional functionality such as batching large inputs together that would need to be duplicated with this method. ```python import json import numpy as np import onnxruntime import soundfile as sf from ttstokenizer import IPATokenizer # This example assumes the files have been downloaded locally with open("kokoro-int8-onnx/voices.json", "r", encoding="utf-8") as f: voices = json.load(f) # Create model model = onnxruntime.InferenceSession( "kokoro-int8-onnx/model.onnx", providers=["CPUExecutionProvider"] ) # Create tokenizer tokenizer = IPATokenizer() # Tokenize inputs inputs = tokenizer("Say something here") # Get speaker array speaker = np.array(self.voices["af"], dtype=np.float32) # Generate speech outputs = model.run(None, { "tokens": [[0, *inputs, 0]], "style": speaker[len(inputs)], "speed": np.ones(1, dtype=np.float32) * 1.0 }) # Write to file sf.write("out.wav", outputs[0], 24000) ``` ## Speaker reference The Kokoro model has a number of built-in speakers. When using this model, set a `speaker` id from the reference table below. | SPEAKER | GENDER | NATIONALITY | EXAMPLE | |:------------|:---------|:---------------|:-----------------| | af | F | American | [Link](https://huggingface.co./NeuML/kokoro-base-onnx/resolve/main/examples/af.mp3) | | af_bella | F | American | [Link](https://huggingface.co./NeuML/kokoro-base-onnx/resolve/main/examples/af_bella.mp3) | | af_nicole | F | American | [Link](https://huggingface.co./NeuML/kokoro-base-onnx/resolve/main/examples/af_nicole.mp3) | | af_sarah | F | American | [Link](https://huggingface.co./NeuML/kokoro-base-onnx/resolve/main/examples/af_sarah.mp3) | | af_sky | F | American | [Link](https://huggingface.co./NeuML/kokoro-base-onnx/resolve/main/examples/af_sky.mp3) | | am_adam | M | American | [Link](https://huggingface.co./NeuML/kokoro-base-onnx/resolve/main/examples/am_adam.mp3) | | af_michael | M | American | [Link](https://huggingface.co./NeuML/kokoro-base-onnx/resolve/main/examples/am_michael.mp3) | | bf_emma | F | British | [Link](https://huggingface.co./NeuML/kokoro-base-onnx/resolve/main/examples/bf_emma.mp3) | | bf_isabella | F | British | [Link](https://huggingface.co./NeuML/kokoro-base-onnx/resolve/main/examples/bf_isabella.mp3) | | bm_george | M | British | [Link](https://huggingface.co./NeuML/kokoro-base-onnx/resolve/main/examples/bm_george.mp3) | | bm_lewis | M | British | [Link](https://huggingface.co./NeuML/kokoro-base-onnx/resolve/main/examples/bm_lewis.mp3) | The following shows an example on how to set a speaker id when using txtai ```python speech, rate = tts("Say something here", speaker="af_sky") ```