metadata

license: mit
title: KKR2
sdk: gradio
colorFrom: blue
colorTo: green

Text-to-Speech App with Kokoro-82M-ONNX

This is a Gradio-based text-to-speech (TTS) app that uses the Kokoro-82M-ONNX model from Hugging Face. The app allows you to generate speech from text with multiple speaker options and download the resulting audio file.

Features

Text-to-Speech Conversion: Convert any input text into speech.
Multiple Speakers: Choose from different speaker voices.
Download Audio: Download the generated speech as a .wav file.

How to Use

Enter Text: Type or paste your text into the input box.
Select Speaker: Choose a speaker from the dropdown menu.
Generate Speech: Click the "Submit" button to generate the speech.
Download Audio: Once the speech is generated, you can listen to it or download the .wav file.

Example Inputs

Text: "Hello, welcome to the text-to-speech app!"
Speaker: "Speaker 1"

Requirements

The app requires the following Python packages:

onnxruntime
torch
gradio
scipy
numpy
huggingface_hub

These dependencies are automatically installed when the Space is built.

Model Details

The app uses the Kokoro-82M-ONNX model, a lightweight and efficient text-to-speech model in ONNX format. The model supports multiple speakers and generates high-quality speech.

Limitations

The model may not handle very long texts efficiently.
Speaker options are limited to the embeddings supported by the model.

Feedback and Contributions

If you encounter any issues or have suggestions for improvement, please open an issue on the GitHub repository or contact me directly.

Enjoy using the app! 🎉