Spaces:

NeuralFalcon
/

Kokoro-TTS

Running

App Files Files Community

NeuralFalcon commited on Jan 20

Commit

6d89762

verified ·

1 Parent(s): 0af6533

Upload 8 files

Browse files

Files changed (8) hide show

.gitignore +6 -0
Kokoro_82M_Colab.ipynb +51 -0
README.md +124 -11
api.py +76 -0
app.py +262 -0
download_model.py +174 -0
requirements.txt +14 -0
srt_dubbing.py +557 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,6 @@

+kokoro_audio/
+KOKORO/voices/
+cache/
+__pycache__/
+run_app.bat
+*.pth

Kokoro_82M_Colab.ipynb ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "gpuType": "T4"
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    },
+    "accelerator": "GPU"
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "%cd /content/\n",
+        "!git clone https://github.com/NeuralFalconYT/Kokoro-82M-WebUI.git\n",
+        "!apt-get -qq -y install espeak-ng > /dev/null 2>&1\n",
+        "%cd /content/Kokoro-82M-WebUI\n",
+        "!python download_model.py\n",
+        "!pip install -r requirements.txt\n",
+        "from IPython.display import clear_output\n",
+        "clear_output()"
+      ],
+      "metadata": {
+        "id": "stDJD3G4KJwP"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "%cd /content/Kokoro-82M-WebUI\n",
+        "!python app.py --share\n",
+        "# !python srt_dubbing.py --share"
+      ],
+      "metadata": {
+        "id": "XSQ2ShKtC1u9"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}

README.md CHANGED Viewed

@@ -1,14 +1,127 @@
 ---
-title: Kokoro TTS
-emoji: 👀
-colorFrom: pink
-colorTo: purple
-sdk: gradio
-sdk_version: 5.12.0
-app_file: app.py
-pinned: false
-license: mit
-short_description: Kokoro TTS WebUI
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Kokoro-TTS
+**Note:** This is not the official repository. Alternatives [kokoro-onnx](https://github.com/thewh1teagle/kokoro-onnx), [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI), [kokoro](https://github.com/hexgrad/kokoro), [kokoro-web](https://huggingface.co/spaces/webml-community/kokoro-web), [Kokoro-Custom-Voice](https://huggingface.co/spaces/ysharma/Make_Custom_Voices_With_KokoroTTS)
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NeuralFalconYT/Kokoro-82M-WebUI/blob/main/Kokoro_82M_Colab.ipynb) <br>
+[![HuggingFace Space Demo](https://img.shields.io/badge/🤗-Space%20demo-yellow)](https://huggingface.co/spaces/hexgrad/Kokoro-TTS)
 ---
+### Installation Tutorial
+My Python Version is 3.10.9.
+#### 1. Clone the GitHub Repository:
+```bash
+git clone https://github.com/NeuralFalconYT/Kokoro-82M-WebUI.git
+cd Kokoro-82M-WebUI
+```
+#### 2. Create a Python Virtual Environment:
+```bash
+python -m venv myenv
+```
+This command creates a new Python virtual environment named `myenv` for isolating dependencies.
+#### 3. Activate the Virtual Environment:
+- **For Windows:**
+  ```bash
+  myenv\Scripts\activate
+  ```
+- **For Linux:**
+  ```bash
+  source myenv/bin/activate
+  ```
+This activates the virtual environment, enabling you to install and run dependencies in an isolated environment.
+Here’s the corrected version of point 4, with proper indentation for the subpoints:
+#### 4. Install PyTorch:
+- **For GPU (CUDA-enabled installation):**
+  - Check CUDA Version (for GPU setup):
+    ```bash
+    nvcc --version
+    ```
+    Find your CUDA version example ```11.8```
+  - Visit [PyTorch Get Started](https://pytorch.org/get-started/locally/) and install the version compatible with your CUDA setup.:<br>
+    - For CUDA 11.8:
+    ```
+    pip install torch  --index-url https://download.pytorch.org/whl/cu118
+    ```
+    - For CUDA 12.1:
+    ```
+    pip install torch  --index-url https://download.pytorch.org/whl/cu121
+    ```
+    - For CUDA 12.4:
+    ```
+    pip install torch  --index-url https://download.pytorch.org/whl/cu124
+    ```
+- **For CPU (if not using GPU):**
+  ```bash
+  pip install torch
+  ```
+  This installs the CPU-only version of PyTorch.
+#### 5. Install Required Dependencies:
+```bash
+pip install -r requirements.txt
+```
+This installs all the required Python libraries listed in the `requirements.txt` file.
+#### 6. Download Model and Get Latest VoicePack:
+```bash
+python download_model.py
+```
+---
+#### 7. Install eSpeak NG
+- **For Windows:**
+  1. Download the latest eSpeak NG release from the [eSpeak NG GitHub Releases](https://github.com/espeak-ng/espeak-ng/releases/tag/1.51).
+  2. Locate and download the file named **`espeak-ng-X64.msi`**.
+  3. Run the installer and follow the installation steps. Ensure that you install eSpeak NG in the default directory:
+     ```
+     C:\Program Files\eSpeak NG
+     ```
+     > **Note:** This default path is required for the application to locate eSpeak NG properly.
+- **For Linux:**
+  1. Open your terminal.
+  2. Install eSpeak NG using the following command:
+     ```bash
+     sudo apt-get -qq -y install espeak-ng > /dev/null 2>&1
+     ```
+     > **Note:** This command suppresses unnecessary output for a cleaner installation process.
 ---
+#### 8. Run Gradio App
+To run the Gradio app, follow these steps:
+1. **Activate the Virtual Environment:**
+   ```bash
+   myenv\Scripts\activate
+   ```
+2. **Run the Application:**
+   ```bash
+   python app.py
+   ```
+   Alternatively, on Windows, double-click on `run_app.bat` to start the application.
+---
+![app](https://github.com/user-attachments/assets/ef3e7c0f-8e72-471d-9639-5327b4f06b29)
+![Podcast](https://github.com/user-attachments/assets/03ddd9ee-5b41-4acb-b0c3-53ef5b1a7fbf)
+![voices](https://github.com/user-attachments/assets/d47f803c-b3fb-489b-bc7b-f08020401ce5)
+### Credits
+[Kokoro HuggingFace](https://huggingface.co/hexgrad/Kokoro-82M)

api.py ADDED Viewed

	@@ -0,0 +1,76 @@

+# It is helpful if you want to use it in a voice assistant project.
+# Know more about {your gradio app url}/?view=api. Example: http://127.0.0.1:7860/?view=api
+import shutil
+import os
+from gradio_client import Client
+# Ensure the output directory exists
+output_dir = "temp_audio"
+os.makedirs(output_dir, exist_ok=True)
+# Initialize the Gradio client
+api_url = "http://127.0.0.1:7860/"
+client = Client(api_url)
+def text_to_speech(
+    text="Hello!!",
+    model_name="kokoro-v0_19.pth",
+    voice_name="af_bella",
+    speed=1,
+    trim=0,
+    pad_between_segments=0,
+    remove_silence=False,
+    minimum_silence=0.05,
+):
+    """
+    Generates speech from text using a specified model and saves the audio file.
+    Parameters:
+        text (str): The text to convert to speech.
+        model_name (str): The name of the model to use for synthesis.
+        voice_name (str): The name of the voice to use.
+        speed (float): The speed of speech.
+        trim (int): Whether to trim silence at the beginning and end.
+        pad_between_segments (int): Padding between audio segments.
+        remove_silence (bool): Whether to remove silence from the audio.
+        minimum_silence (float): Minimum silence duration to consider.
+    Returns:
+        str: Path to the saved audio file.
+    """
+    # Call the API with provided parameters
+    result = client.predict(
+        text=text,
+        model_name=model_name,
+        voice_name=voice_name,
+        speed=speed,
+        trim=trim,
+        pad_between_segments=pad_between_segments,
+        remove_silence=remove_silence,
+        minimum_silence=minimum_silence,
+        api_name="/text_to_speech"
+    )
+    # Save the audio file in the specified directory
+    save_at = f"{output_dir}/{os.path.basename(result)}"
+    shutil.move(result, save_at)
+    print(f"Saved at {save_at}")
+    return save_at
+# Example usage
+if __name__ == "__main__":
+    text="This is Kokoro TTS. I am a text-to-speech model and Super Fast."
+    model_name="kokoro-v0_19.pth" #kokoro-v0_19-half.pth
+    voice_name="af_bella" #get voice names
+    speed=1
+    only_trim_both_ends_silence=0
+    add_silence_between_segments=0 #it use in large text
+    remove_silence=False
+    keep_silence_upto=0.05 #in seconds
+    audio_path = text_to_speech(text=text, model_name=model_name,
+                                voice_name=voice_name, speed=speed,
+                                trim=only_trim_both_ends_silence,
+                                pad_between_segments=add_silence_between_segments,
+                                remove_silence=remove_silence,
+                                minimum_silence=keep_silence_upto)
+    print(f"Audio file saved at: {audio_path}")

app.py ADDED Viewed

	@@ -0,0 +1,262 @@

+from KOKORO.models import build_model
+from KOKORO.utils import tts,tts_file_name,podcast
+import sys
+sys.path.append('.')
+import torch
+import gc
+print("Loading model...")
+device = 'cuda' if torch.cuda.is_available() else 'cpu'
+print(f'Using device: {device}')
+MODEL = build_model('./KOKORO/kokoro-v0_19.pth', device)
+print("Model loaded successfully.")
+def tts_maker(text,voice_name="af_bella",speed = 0.8,trim=0,pad_between=0,save_path="temp.wav",remove_silence=False,minimum_silence=50):
+    # Sanitize the save_path to remove any newline characters
+    save_path = save_path.replace('\n', '').replace('\r', '')
+    global MODEL
+    audio_path=tts(MODEL,device,text,voice_name,speed=speed,trim=trim,pad_between_segments=pad_between,output_file=save_path,remove_silence=remove_silence,minimum_silence=minimum_silence)
+    return audio_path
+model_list = ["kokoro-v0_19.pth", "kokoro-v0_19-half.pth"]
+current_model = model_list[0]
+def update_model(model_name):
+    """
+    Updates the TTS model only if the specified model is not already loaded.
+    """
+    global MODEL, current_model
+    if current_model == model_name:
+        return f"Model already set to {model_name}"  # No need to reload
+    model_path = f"./KOKORO/{model_name}"  # Default model path
+    if model_name == "kokoro-v0_19-half.pth":
+        model_path = f"./KOKORO/fp16/{model_name}"  # Update path for specific model
+    # print(f"Loading new model: {model_name}")
+    del MODEL  # Cleanup existing model
+    gc.collect()
+    torch.cuda.empty_cache()  # Ensure GPU memory is cleared
+    MODEL = build_model(model_path, device)
+    current_model = model_name
+    return f"Model updated to {model_name}"
+def text_to_speech(text, model_name, voice_name, speed, trim, pad_between_segments, remove_silence, minimum_silence):
+    """
+    Converts text to speech using the specified parameters and ensures the model is updated only if necessary.
+    """
+    update_status = update_model(model_name)  # Load the model only if required
+    # print(update_status)  # Log model loading status
+    if not minimum_silence:
+        minimum_silence = 0.05
+    keep_silence = int(minimum_silence * 1000)
+    save_at = tts_file_name(text)
+    audio_path = tts_maker(
+        text,
+        voice_name,
+        speed,
+        trim,
+        pad_between_segments,
+        save_at,
+        remove_silence,
+        keep_silence
+    )
+    return audio_path
+import gradio as gr
+# voice_list = [
+#     'af',  # Default voice is a 50-50 mix of af_bella & af_sarah
+#     'af_bella', 'af_sarah', 'am_adam', 'am_michael',
+#     'bf_emma', 'bf_isabella', 'bm_george', 'bm_lewis',
+# ]
+import os
+# Get the list of voice names without file extensions
+voice_list = [
+    os.path.splitext(filename)[0]
+    for filename in os.listdir("./KOKORO/voices")
+    if filename.endswith('.pt')
+]
+# Sort the list based on the length of each name
+voice_list = sorted(voice_list, key=len)
+def toggle_autoplay(autoplay):
+    return gr.Audio(interactive=False, label='Output Audio', autoplay=autoplay)
+with gr.Blocks() as demo1:
+    gr.Markdown("# Batched TTS")
+    with gr.Row():
+        with gr.Column():
+            text = gr.Textbox(
+                label='Enter Text',
+                lines=3,
+                placeholder="Type your text here..."
+            )
+            with gr.Row():
+                voice = gr.Dropdown(
+                    voice_list,
+                    value='af',
+                    allow_custom_value=False,
+                    label='Voice',
+                    info='Starred voices are more stable'
+                )
+            with gr.Row():
+                generate_btn = gr.Button('Generate', variant='primary')
+            with gr.Accordion('Audio Settings', open=False):
+                model_name=gr.Dropdown(model_list,label="Model",value=model_list[0])
+                remove_silence = gr.Checkbox(value=False, label='✂️ Remove Silence From TTS')
+                minimum_silence = gr.Number(
+                    label="Keep Silence Upto (In seconds)",
+                    value=0.05
+                )
+                speed = gr.Slider(
+                    minimum=0.25, maximum=2, value=1, step=0.1,
+                    label='⚡️Speed', info='Adjust the speaking speed'
+                )
+                trim = gr.Slider(
+                    minimum=0, maximum=1, value=0, step=0.1,
+                    label='🔪 Trim', info='How much to cut from both ends of each segment'
+                )
+                pad_between = gr.Slider(
+                    minimum=0, maximum=2, value=0, step=0.1,
+                    label='🔇 Pad Between', info='Silent Duration between segments [For Large Text]'
+                )
+        with gr.Column():
+            audio = gr.Audio(interactive=False, label='Output Audio', autoplay=True)
+            with gr.Accordion('Enable Autoplay', open=False):
+                autoplay = gr.Checkbox(value=True, label='Autoplay')
+                autoplay.change(toggle_autoplay, inputs=[autoplay], outputs=[audio])
+    text.submit(
+        text_to_speech,
+        inputs=[text, model_name,voice, speed, trim, pad_between, remove_silence, minimum_silence],
+        outputs=[audio]
+    )
+    generate_btn.click(
+        text_to_speech,
+        inputs=[text,model_name, voice, speed, trim, pad_between, remove_silence, minimum_silence],
+        outputs=[audio]
+    )
+def podcast_maker(text,remove_silence=False,minimum_silence=50,model_name="kokoro-v0_19.pth"):
+    global MODEL,device
+    update_model(model_name)
+    if not minimum_silence:
+        minimum_silence = 0.05
+    keep_silence = int(minimum_silence * 1000)
+    podcast_save_at=podcast(MODEL, device,text,remove_silence=remove_silence, minimum_silence=keep_silence)
+    return podcast_save_at
+dummpy_example="""{af} Hello, I'd like to order a sandwich please.
+{af_sky} What do you mean you're out of bread?
+{af_bella} I really wanted a sandwich though...
+{af_nicole} You know what, darn you and your little shop!
+{bm_george} I'll just go back home and cry now.
+{am_adam} Why me?"""
+with gr.Blocks() as demo2:
+    gr.Markdown(
+        """
+    # Multiple Speech-Type Generation
+    This section allows you to generate multiple speech types or multiple people's voices. Enter your text in the format shown below, and the system will generate speech using the appropriate type. If unspecified, the model will use "af" voice.
+    Format:
+    {voice_name} your text here
+    """
+    )
+    with gr.Row():
+        gr.Markdown(
+            """
+            **Example Input:**
+            {af} Hello, I'd like to order a sandwich please.
+            {af_sky} What do you mean you're out of bread?
+            {af_bella} I really wanted a sandwich though...
+            {af_nicole} You know what, darn you and your little shop!
+            {bm_george} I'll just go back home and cry now.
+            {am_adam} Why me?!
+            """
+        )
+    with gr.Row():
+        with gr.Column():
+            text = gr.Textbox(
+                label='Enter Text',
+                lines=7,
+                placeholder=dummpy_example
+            )
+            with gr.Row():
+                generate_btn = gr.Button('Generate', variant='primary')
+            with gr.Accordion('Audio Settings', open=False):
+                remove_silence = gr.Checkbox(value=False, label='✂️ Remove Silence From TTS')
+                minimum_silence = gr.Number(
+                    label="Keep Silence Upto (In seconds)",
+                    value=0.20
+                )
+        with gr.Column():
+            audio = gr.Audio(interactive=False, label='Output Audio', autoplay=True)
+            with gr.Accordion('Enable Autoplay', open=False):
+                autoplay = gr.Checkbox(value=True, label='Autoplay')
+                autoplay.change(toggle_autoplay, inputs=[autoplay], outputs=[audio])
+    text.submit(
+        podcast_maker,
+        inputs=[text, remove_silence, minimum_silence],
+        outputs=[audio]
+    )
+    generate_btn.click(
+        podcast_maker,
+        inputs=[text, remove_silence, minimum_silence],
+        outputs=[audio]
+    )
+display_text = "  \n".join(voice_list)
+with gr.Blocks() as demo3:
+    gr.Markdown(f"# Voice Names \n{display_text}")
+import click
+@click.command()
+@click.option("--debug", is_flag=True, default=False, help="Enable debug mode.")
+@click.option("--share", is_flag=True, default=False, help="Enable sharing of the interface.")
+def main(debug, share):
+    demo = gr.TabbedInterface([demo1, demo2,demo3], ["Batched TTS", "Multiple Speech-Type Generation","Available Voice Names"],title="Kokoro TTS")
+    demo.queue().launch(debug=debug, share=share)
+    #Run on local network
+    # laptop_ip="192.168.0.30"
+    # port=8080
+    # demo.queue().launch(debug=debug, share=share,server_name=laptop_ip,server_port=port)
+if __name__ == "__main__":
+    main()
+##For client side
+# from gradio_client import Client
+# import shutil
+# import os
+# os.makedirs("temp_audio", exist_ok=True)
+# from gradio_client import Client
+# client = Client("http://127.0.0.1:7860/")
+# result = client.predict(
+# 		text="Hello!!",
+# 		model_name="kokoro-v0_19.pth",
+# 		voice_name="af_bella",
+# 		speed=1,
+# 		trim=0,
+# 		pad_between_segments=0,
+# 		remove_silence=False,
+# 		minimum_silence=0.05,
+# 		api_name="/text_to_speech"
+# )
+# save_at=f"./temp_audio/{os.path.basename(result)}"
+# shutil.move(result, save_at)
+# print(f"Saved at {save_at}")

download_model.py ADDED Viewed

	@@ -0,0 +1,174 @@

+from huggingface_hub import list_repo_files, hf_hub_download
+import os
+import shutil
+# Repository ID
+repo_id = "hexgrad/Kokoro-82M"
+# Set up the cache directory
+cache_dir = "./cache"  # Customize this path if needed
+os.makedirs(cache_dir, exist_ok=True)
+def get_voice_models():
+    # Ensure the 'voices' directory exists
+    voices_dir = './KOKORO/voices'
+    if os.path.exists(voices_dir):
+        shutil.rmtree(voices_dir)
+    os.makedirs(voices_dir, exist_ok=True)
+    # Get the list of all files
+    files = list_repo_files(repo_id)
+    # Filter files for the 'voices/' folder
+    voice_files = [file.replace("voices/", "") for file in files if file.startswith("voices/")]
+    # Get current files in the 'voices' folder
+    current_voice = os.listdir(voices_dir)
+    # Identify files that need to be downloaded
+    download_voice = [file for file in voice_files if file not in current_voice]
+    if download_voice:
+        print(f"Files to download: {download_voice}")
+    # Download each missing file
+    for file in download_voice:
+        file_path = hf_hub_download(repo_id=repo_id, filename=f"voices/{file}", cache_dir=cache_dir)
+        target_path = os.path.join(voices_dir, file)
+        shutil.copy(file_path, target_path)
+        print(f"Downloaded: {file} to {target_path}")
+# Call the function to execute the code
+get_voice_models()
+# Check and download additional required files with caching
+kokoro_file = "kokoro-v0_19.pth"
+fp16_file = "fp16/kokoro-v0_19-half.pth"
+if kokoro_file not in os.listdir("./KOKORO/"):
+    file_path = hf_hub_download(repo_id=repo_id, filename=kokoro_file, cache_dir=cache_dir)
+    shutil.copy(file_path, os.path.join("./KOKORO/", kokoro_file))
+    print(f"Downloaded: {kokoro_file} to ./KOKORO/")
+if "fp16" not in os.listdir("./KOKORO/"):
+    os.makedirs("./KOKORO/fp16", exist_ok=True)
+if os.path.basename(fp16_file) not in os.listdir("./KOKORO/fp16/"):
+    file_path = hf_hub_download(repo_id=repo_id, filename=fp16_file, cache_dir=cache_dir)
+    shutil.copy(file_path, os.path.join("./KOKORO/fp16/", os.path.basename(fp16_file)))
+    print(f"Downloaded: {os.path.basename(fp16_file)} to ./KOKORO/fp16/")
+#For Windows one click run
+import os
+import platform
+def setup_batch_file():
+    # Check if the system is Windows
+    if platform.system() == "Windows":
+        # Check if 'run.bat' exists in the current folder
+        if os.path.exists("run.bat"):
+            print("'run.bat' already exists in the current folder.")
+        else:
+            # Content for run_app.bat
+            bat_content_app = '''@echo off
+call myenv\\Scripts\\activate
+@python.exe app.py %*
+@pause
+'''
+            # Save the content to run_app.bat
+            with open('run_app.bat', 'w') as bat_file:
+                bat_file.write(bat_content_app)
+            print("The 'run_app.bat' file has been created.")
+    else:
+        print("This system is not Windows. Batch file creation skipped.")
+# Run the setup function
+setup_batch_file()
+import torch
+import os
+from itertools import combinations
+def mix_all_voices(folder_path="./KOKORO/voices"):
+    """Mix all pairs of voice models and save the new models."""
+    # Get the list of available voice packs
+    available_voice_pack = [
+        os.path.splitext(filename)[0]
+        for filename in os.listdir(folder_path)
+        if filename.endswith('.pt')
+    ]
+    # Generate all unique pairs of voices
+    voice_combinations = combinations(available_voice_pack, 2)
+    # def mix_model(voice_1, voice_2, weight_1=0.6, weight_2=0.4):
+    #     """Mix two voice models with a weighted average and save the new model."""
+    #     new_name = f"{voice_1}_mix_{voice_2}"
+    #     voice_id_1 = torch.load(f'{folder_path}/{voice_1}.pt', weights_only=True)
+    #     voice_id_2 = torch.load(f'{folder_path}/{voice_2}.pt', weights_only=True)
+    #     # Create the mixed model using a weighted average
+    #     mixed_voice = (weight_1 * voice_id_1) + (weight_2 * voice_id_2)
+    #     # Save the mixed model
+    #     torch.save(mixed_voice, f'{folder_path}/{new_name}.pt')
+    #     print(f"Created new voice model: {new_name}")
+    # Function to mix two voices
+    def mix_model(voice_1, voice_2):
+        """Mix two voice models and save the new model."""
+        new_name = f"{voice_1}_mix_{voice_2}"
+        voice_id_1 = torch.load(f'{folder_path}/{voice_1}.pt', weights_only=True)
+        voice_id_2 = torch.load(f'{folder_path}/{voice_2}.pt', weights_only=True)
+        # Create the mixed model by averaging the weights
+        mixed_voice = torch.mean(torch.stack([voice_id_1, voice_id_2]), dim=0)
+        # Save the mixed model
+        torch.save(mixed_voice, f'{folder_path}/{new_name}.pt')
+        print(f"Created new voice model: {new_name}")
+    # Create mixed voices for each pair
+    for voice_1, voice_2 in voice_combinations:
+        print(f"Mixing {voice_1} ❤️ {voice_2}")
+        mix_model(voice_1, voice_2)
+# Call the function to mix all voices
+mix_all_voices("./KOKORO/voices")
+def save_voice_names(directory="./KOKORO/voices", output_file="./voice_names.txt"):
+    """
+    Retrieves voice names from a directory, sorts them by length, and saves to a file.
+    Parameters:
+        directory (str): Directory containing the voice files.
+        output_file (str): File to save the sorted voice names.
+    Returns:
+        None
+    """
+    # Get the list of voice names without file extensions
+    voice_list = [
+        os.path.splitext(filename)[0]
+        for filename in os.listdir(directory)
+        if filename.endswith('.pt')
+    ]
+    # Sort the list based on the length of each name
+    voice_list = sorted(voice_list, key=len)
+    # Save the sorted list to the specified file
+    with open(output_file, "w") as f:
+        for voice_name in voice_list:
+            f.write(f"{voice_name}\n")
+    print(f"Voice names saved to {output_file}")
+save_voice_names()

requirements.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+phonemizer>=3.3.0
+scipy>=1.14.1
+munch>=4.0.0
+transformers>=4.47.1
+click>=8.1.8
+librosa>=0.10.2
+simpleaudio>=1.0.4
+gradio>=5.9.1
+huggingface-hub>=0.27.0
+pydub>=0.25.1
+pysrt>=1.1.2
+# fastapi>=0.115.6
+# uvicorn>=0.34.0
+# torch

srt_dubbing.py ADDED Viewed

	@@ -0,0 +1,557 @@

+from KOKORO.models import build_model
+from KOKORO.utils import tts,tts_file_name,podcast
+import sys
+sys.path.append('.')
+import torch
+import gc
+print("Loading model...")
+device = 'cuda' if torch.cuda.is_available() else 'cpu'
+print(f'Using device: {device}')
+MODEL = build_model('./KOKORO/kokoro-v0_19.pth', device)
+print("Model loaded successfully.")
+def tts_maker(text,voice_name="af_bella",speed = 0.8,trim=0,pad_between=0,save_path="temp.wav",remove_silence=False,minimum_silence=50):
+    # Sanitize the save_path to remove any newline characters
+    save_path = save_path.replace('\n', '').replace('\r', '')
+    global MODEL
+    audio_path=tts(MODEL,device,text,voice_name,speed=speed,trim=trim,pad_between_segments=pad_between,output_file=save_path,remove_silence=remove_silence,minimum_silence=minimum_silence)
+    return audio_path
+model_list = ["kokoro-v0_19.pth", "kokoro-v0_19-half.pth"]
+current_model = model_list[0]
+def update_model(model_name):
+    """
+    Updates the TTS model only if the specified model is not already loaded.
+    """
+    global MODEL, current_model
+    if current_model == model_name:
+        return f"Model already set to {model_name}"  # No need to reload
+    model_path = f"./KOKORO/{model_name}"  # Default model path
+    if model_name == "kokoro-v0_19-half.pth":
+        model_path = f"./KOKORO/fp16/{model_name}"  # Update path for specific model
+    # print(f"Loading new model: {model_name}")
+    del MODEL  # Cleanup existing model
+    gc.collect()
+    torch.cuda.empty_cache()  # Ensure GPU memory is cleared
+    MODEL = build_model(model_path, device)
+    current_model = model_name
+    return f"Model updated to {model_name}"
+def text_to_speech(text, model_name="kokoro-v0_19.pth", voice_name="af", speed=1.0, trim=1.0, pad_between_segments=0, remove_silence=True, minimum_silence=0.20):
+    """
+    Converts text to speech using the specified parameters and ensures the model is updated only if necessary.
+    """
+    update_status = update_model(model_name)  # Load the model only if required
+    # print(update_status)  # Log model loading status
+    if not minimum_silence:
+        minimum_silence = 0.05
+    keep_silence = int(minimum_silence * 1000)
+    save_at = tts_file_name(text)
+    audio_path = tts_maker(
+        text,
+        voice_name,
+        speed,
+        trim,
+        pad_between_segments,
+        save_at,
+        remove_silence,
+        keep_silence
+    )
+    return audio_path
+import gradio as gr
+# voice_list = [
+#     'af',  # Default voice is a 50-50 mix of af_bella & af_sarah
+#     'af_bella', 'af_sarah', 'am_adam', 'am_michael',
+#     'bf_emma', 'bf_isabella', 'bm_george', 'bm_lewis',
+# ]
+import os
+# Get the list of voice names without file extensions
+voice_list = [
+    os.path.splitext(filename)[0]
+    for filename in os.listdir("./KOKORO/voices")
+    if filename.endswith('.pt')
+]
+# Sort the list based on the length of each name
+voice_list = sorted(voice_list, key=len)
+def toggle_autoplay(autoplay):
+    return gr.Audio(interactive=False, label='Output Audio', autoplay=autoplay)
+with gr.Blocks() as demo1:
+    gr.Markdown("# Batched TTS")
+    with gr.Row():
+        with gr.Column():
+            text = gr.Textbox(
+                label='Enter Text',
+                lines=3,
+                placeholder="Type your text here..."
+            )
+            with gr.Row():
+                voice = gr.Dropdown(
+                    voice_list,
+                    value='af',
+                    allow_custom_value=False,
+                    label='Voice',
+                    info='Starred voices are more stable'
+                )
+            with gr.Row():
+                generate_btn = gr.Button('Generate', variant='primary')
+            with gr.Accordion('Audio Settings', open=False):
+                model_name=gr.Dropdown(model_list,label="Model",value=model_list[0])
+                remove_silence = gr.Checkbox(value=False, label='✂️ Remove Silence From TTS')
+                minimum_silence = gr.Number(
+                    label="Keep Silence Upto (In seconds)",
+                    value=0.05
+                )
+                speed = gr.Slider(
+                    minimum=0.25, maximum=2, value=1, step=0.1,
+                    label='⚡️Speed', info='Adjust the speaking speed'
+                )
+                trim = gr.Slider(
+                    minimum=0, maximum=1, value=0, step=0.1,
+                    label='🔪 Trim', info='How much to cut from both ends of each segment'
+                )
+                pad_between = gr.Slider(
+                    minimum=0, maximum=2, value=0, step=0.1,
+                    label='🔇 Pad Between', info='Silent Duration between segments [For Large Text]'
+                )
+        with gr.Column():
+            audio = gr.Audio(interactive=False, label='Output Audio', autoplay=True)
+            with gr.Accordion('Enable Autoplay', open=False):
+                autoplay = gr.Checkbox(value=True, label='Autoplay')
+                autoplay.change(toggle_autoplay, inputs=[autoplay], outputs=[audio])
+    text.submit(
+        text_to_speech,
+        inputs=[text, model_name,voice, speed, trim, pad_between, remove_silence, minimum_silence],
+        outputs=[audio]
+    )
+    generate_btn.click(
+        text_to_speech,
+        inputs=[text,model_name, voice, speed, trim, pad_between, remove_silence, minimum_silence],
+        outputs=[audio]
+    )
+def podcast_maker(text,remove_silence=False,minimum_silence=50,model_name="kokoro-v0_19.pth"):
+    global MODEL,device
+    update_model(model_name)
+    if not minimum_silence:
+        minimum_silence = 0.05
+    keep_silence = int(minimum_silence * 1000)
+    podcast_save_at=podcast(MODEL, device,text,remove_silence=remove_silence, minimum_silence=keep_silence)
+    return podcast_save_at
+dummpy_example="""{af} Hello, I'd like to order a sandwich please.
+{af_sky} What do you mean you're out of bread?
+{af_bella} I really wanted a sandwich though...
+{af_nicole} You know what, darn you and your little shop!
+{bm_george} I'll just go back home and cry now.
+{am_adam} Why me?"""
+with gr.Blocks() as demo2:
+    gr.Markdown(
+        """
+    # Multiple Speech-Type Generation
+    This section allows you to generate multiple speech types or multiple people's voices. Enter your text in the format shown below, and the system will generate speech using the appropriate type. If unspecified, the model will use "af" voice.
+    Format:
+    {voice_name} your text here
+    """
+    )
+    with gr.Row():
+        gr.Markdown(
+            """
+            **Example Input:**
+            {af} Hello, I'd like to order a sandwich please.
+            {af_sky} What do you mean you're out of bread?
+            {af_bella} I really wanted a sandwich though...
+            {af_nicole} You know what, darn you and your little shop!
+            {bm_george} I'll just go back home and cry now.
+            {am_adam} Why me?!
+            """
+        )
+    with gr.Row():
+        with gr.Column():
+            text = gr.Textbox(
+                label='Enter Text',
+                lines=7,
+                placeholder=dummpy_example
+            )
+            with gr.Row():
+                generate_btn = gr.Button('Generate', variant='primary')
+            with gr.Accordion('Audio Settings', open=False):
+                remove_silence = gr.Checkbox(value=False, label='✂️ Remove Silence From TTS')
+                minimum_silence = gr.Number(
+                    label="Keep Silence Upto (In seconds)",
+                    value=0.20
+                )
+        with gr.Column():
+            audio = gr.Audio(interactive=False, label='Output Audio', autoplay=True)
+            with gr.Accordion('Enable Autoplay', open=False):
+                autoplay = gr.Checkbox(value=True, label='Autoplay')
+                autoplay.change(toggle_autoplay, inputs=[autoplay], outputs=[audio])
+    text.submit(
+        podcast_maker,
+        inputs=[text, remove_silence, minimum_silence],
+        outputs=[audio]
+    )
+    generate_btn.click(
+        podcast_maker,
+        inputs=[text, remove_silence, minimum_silence],
+        outputs=[audio]
+    )
+import shutil
+import os
+# Ensure the output directory exists
+output_dir = "./temp_audio"
+os.makedirs(output_dir, exist_ok=True)
+#@title Generate Audio File From Subtitle
+# from tqdm.notebook import tqdm
+from tqdm import tqdm
+import subprocess
+import json
+import pysrt
+import os
+from pydub import AudioSegment
+import shutil
+import uuid
+import re
+import time
+# os.chdir(install_path)
+def your_tts(text,audio_path,actual_duration,speed=1.0):
+  global srt_voice_name
+  model_name="kokoro-v0_19.pth"
+  tts_path=text_to_speech(text, model_name, voice_name=srt_voice_name,speed=speed)
+  print(tts_path)
+  tts_audio = AudioSegment.from_file(tts_path)
+  tts_duration = len(tts_audio)
+  if tts_duration > actual_duration:
+    speedup_factor = tts_duration / actual_duration
+    tts_path=text_to_speech(text, model_name, voice_name=srt_voice_name,speed=speedup_factor)
+  print(tts_path)
+  shutil.copy(tts_path,audio_path)
+base_path="."
+import datetime
+def get_current_time():
+    # Return current time as a string in the format HH_MM_AM/PM
+    return datetime.datetime.now().strftime("%I_%M_%p")
+def get_subtitle_Dub_path(srt_file_path,Language="en"):
+  file_name = os.path.splitext(os.path.basename(srt_file_path))[0]
+  if not os.path.exists(f"{base_path}/TTS_DUB"):
+    os.mkdir(f"{base_path}/TTS_DUB")
+  random_string = str(uuid.uuid4())[:6]
+  new_path=f"{base_path}/TTS_DUB/{file_name}_{Language}_{get_current_time()}_{random_string}.wav"
+  return new_path
+def clean_srt(input_path):
+    file_name = os.path.basename(input_path)
+    output_folder = f"{base_path}/save_srt"
+    if not os.path.exists(output_folder):
+        os.mkdir(output_folder)
+    output_path = f"{output_folder}/{file_name}"
+    def clean_srt_line(text):
+        bad_list = ["[", "]", "♫", "\n"]
+        for i in bad_list:
+            text = text.replace(i, "")
+        return text.strip()
+    # Load the subtitle file
+    subs = pysrt.open(input_path)
+    # Iterate through each subtitle and print its details
+    with open(output_path, "w", encoding='utf-8') as file:
+        for sub in subs:
+            file.write(f"{sub.index}\n")
+            file.write(f"{sub.start} --> {sub.end}\n")
+            file.write(f"{clean_srt_line(sub.text)}\n")
+            file.write("\n")
+        file.close()
+    # print(f"Clean SRT saved at: {output_path}")
+    return output_path
+# Example usage
+class SRTDubbing:
+    def __init__(self):
+        pass
+    @staticmethod
+    def text_to_speech_srt(text, audio_path, language, actual_duration):
+        tts_filename = "./cache/temp.wav"
+        your_tts(text,tts_filename,actual_duration,speed=1.0)
+        # Check the duration of the generated TTS audio
+        tts_audio = AudioSegment.from_file(tts_filename)
+        tts_duration = len(tts_audio)
+        if actual_duration == 0:
+            # If actual duration is zero, use the original TTS audio without modifications
+            shutil.move(tts_filename, audio_path)
+            return
+        # If TTS audio duration is longer than actual duration, speed up the audio
+        if tts_duration > actual_duration:
+            speedup_factor = tts_duration / actual_duration
+            speedup_filename = "./cache/speedup_temp.wav"
+            # Use ffmpeg to change audio speed
+            subprocess.run([
+                "ffmpeg",
+                "-i", tts_filename,
+                "-filter:a", f"atempo={speedup_factor}",
+                speedup_filename,
+                "-y"
+            ], check=True)
+            # Replace the original TTS audio with the sped-up version
+            shutil.move(speedup_filename, audio_path)
+        elif tts_duration < actual_duration:
+            # If TTS audio duration is less than actual duration, add silence to match the duration
+            silence_gap = actual_duration - tts_duration
+            silence = AudioSegment.silent(duration=int(silence_gap))
+            new_audio = tts_audio + silence
+            # Save the new audio with added silence
+            new_audio.export(audio_path, format="wav")
+        else:
+            # If TTS audio duration is equal to actual duration, use the original TTS audio
+            shutil.move(tts_filename, audio_path)
+    @staticmethod
+    def make_silence(pause_time, pause_save_path):
+        silence = AudioSegment.silent(duration=pause_time)
+        silence.export(pause_save_path, format="wav")
+        return pause_save_path
+    @staticmethod
+    def create_folder_for_srt(srt_file_path):
+        srt_base_name = os.path.splitext(os.path.basename(srt_file_path))[0]
+        random_uuid = str(uuid.uuid4())[:4]
+        dummy_folder_path = f"{base_path}/dummy"
+        if not os.path.exists(dummy_folder_path):
+            os.makedirs(dummy_folder_path)
+        folder_path = os.path.join(dummy_folder_path, f"{srt_base_name}_{random_uuid}")
+        os.makedirs(folder_path, exist_ok=True)
+        return folder_path
+    @staticmethod
+    def concatenate_audio_files(audio_paths, output_path):
+        concatenated_audio = AudioSegment.silent(duration=0)
+        for audio_path in audio_paths:
+            audio_segment = AudioSegment.from_file(audio_path)
+            concatenated_audio += audio_segment
+        concatenated_audio.export(output_path, format="wav")
+    def srt_to_dub(self, srt_file_path,dub_save_path,language='en'):
+        result = self.read_srt_file(srt_file_path)
+        new_folder_path = self.create_folder_for_srt(srt_file_path)
+        join_path = []
+        for i in tqdm(result):
+        # for i in result:
+            text = i['text']
+            actual_duration = i['end_time'] - i['start_time']
+            pause_time = i['pause_time']
+            slient_path = f"{new_folder_path}/{i['previous_pause']}"
+            self.make_silence(pause_time, slient_path)
+            join_path.append(slient_path)
+            tts_path = f"{new_folder_path}/{i['audio_name']}"
+            self.text_to_speech_srt(text, tts_path, language, actual_duration)
+            join_path.append(tts_path)
+        self.concatenate_audio_files(join_path, dub_save_path)
+    @staticmethod
+    def convert_to_millisecond(time_str):
+      if isinstance(time_str, str):
+          hours, minutes, second_millisecond = time_str.split(':')
+          seconds, milliseconds = second_millisecond.split(",")
+          total_milliseconds = (
+              int(hours) * 3600000 +
+              int(minutes) * 60000 +
+              int(seconds) * 1000 +
+              int(milliseconds)
+          )
+          return total_milliseconds
+    @staticmethod
+    def read_srt_file(file_path):
+        entries = []
+        default_start = 0
+        previous_end_time = default_start
+        entry_number = 1
+        audio_name_template = "{}.wav"
+        previous_pause_template = "{}_before_pause.wav"
+        with open(file_path, 'r', encoding='utf-8') as file:
+            lines = file.readlines()
+            # print(lines)
+            for i in range(0, len(lines), 4):
+                time_info = re.findall(r'(\d+:\d+:\d+,\d+) --> (\d+:\d+:\d+,\d+)', lines[i + 1])
+                start_time = SRTDubbing.convert_to_millisecond(time_info[0][0])
+                end_time = SRTDubbing.convert_to_millisecond(time_info[0][1])
+                current_entry = {
+                    'entry_number': entry_number,
+                    'start_time': start_time,
+                    'end_time': end_time,
+                    'text': lines[i + 2].strip(),
+                    'pause_time': start_time - previous_end_time if entry_number != 1 else start_time - default_start,
+                    'audio_name': audio_name_template.format(entry_number),
+                    'previous_pause': previous_pause_template.format(entry_number),
+                }
+                entries.append(current_entry)
+                previous_end_time = end_time
+                entry_number += 1
+        with open("entries.json", "w") as file:
+            json.dump(entries, file, indent=4)
+        return entries
+srt_voice_name="am_adam"
+def srt_process(srt_file_path,voice_name,dest_language="en"):
+  global srt_voice_name
+  srt_voice_name=voice_name
+  srt_dubbing = SRTDubbing()
+  dub_save_path=get_subtitle_Dub_path(srt_file_path,dest_language)
+  srt_dubbing.srt_to_dub(srt_file_path,dub_save_path,dest_language)
+  return dub_save_path
+#
+# srt_file_path="./long.srt"
+# dub_audio_path=srt_process(srt_file_path)
+# print(f"Audio file saved at: {dub_audio_path}")
+with gr.Blocks() as demo3:
+    gr.Markdown(
+        """
+        # Generate Audio File From Subtitle [Single Speaker Only]
+        To generate subtitles, you can use the [Whisper Turbo Subtitle](https://github.com/NeuralFalconYT/Whisper-Turbo-Subtitle)
+        [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NeuralFalconYT/Whisper-Turbo-Subtitle/blob/main/Whisper_Turbo_Subtitle.ipynb)
+        """
+    )
+    with gr.Row():
+        with gr.Column():
+            srt_file = gr.File(label='Upload .srt Subtitle File Only')
+            with gr.Row():
+                voice = gr.Dropdown(
+                    voice_list,
+                    value='af',
+                    allow_custom_value=False,
+                    label='Voice',
+                )
+            with gr.Row():
+                generate_btn_ = gr.Button('Generate', variant='primary')
+        with gr.Column():
+            audio = gr.Audio(interactive=False, label='Output Audio', autoplay=True)
+            with gr.Accordion('Enable Autoplay', open=False):
+                autoplay = gr.Checkbox(value=True, label='Autoplay')
+                autoplay.change(toggle_autoplay, inputs=[autoplay], outputs=[audio])
+    # srt_file.submit(
+    #     srt_process,
+    #     inputs=[srt_file, voice],
+    #     outputs=[audio]
+    # )
+    generate_btn_.click(
+        srt_process,
+        inputs=[srt_file,voice],
+        outputs=[audio]
+    )
+display_text = "  \n".join(voice_list)
+with gr.Blocks() as demo4:
+    gr.Markdown(f"# Voice Names \n{display_text}")
+import click
+@click.command()
+@click.option("--debug", is_flag=True, default=False, help="Enable debug mode.")
+@click.option("--share", is_flag=True, default=False, help="Enable sharing of the interface.")
+def main(debug, share):
+    demo = gr.TabbedInterface([demo1, demo2,demo3,demo4], ["Batched TTS", "Multiple Speech-Type Generation","SRT Dubbing","Available Voice Names"],title="Kokoro TTS")
+    demo.queue().launch(debug=debug, share=share)
+    #Run on local network
+    # laptop_ip="192.168.0.30"
+    # port=8080
+    # demo.queue().launch(debug=debug, share=share,server_name=laptop_ip,server_port=port)
+if __name__ == "__main__":
+    main()
+##For client side
+# from gradio_client import Client
+# import shutil
+# import os
+# os.makedirs("temp_audio", exist_ok=True)
+# from gradio_client import Client
+# client = Client("http://127.0.0.1:7860/")
+# result = client.predict(
+# 		text="Hello!!",
+# 		model_name="kokoro-v0_19.pth",
+# 		voice_name="af_bella",
+# 		speed=1,
+# 		trim=0,
+# 		pad_between_segments=0,
+# 		remove_silence=False,
+# 		minimum_silence=0.05,
+# 		api_name="/text_to_speech"
+# )
+# save_at=f"./temp_audio/{os.path.basename(result)}"
+# shutil.move(result, save_at)
+# print(f"Saved at {save_at}")