hexgrad/Kokoro-82M · How to add Pauses & Emotions?

aagawade1999

11 days ago

Chapter 1: Batman Rises.

Bale said:
I'm Batman.

I was expecting pause after "Rises.", but it didn't!

hexgrad

Owner 11 days ago

Emotions are limited by the training data of the model. Simply put, this model did not see much/any emotional data in training, so it cannot do emotions very well. It can do brief pauses using rich punctuation like ".,;:—, but delivering speech really angry or sad is not in the range of v0.19.

Emotions should be possible with emotional training data: there is a Japanese StyleTTS 2 model that can do emotions at https://hf.co/spaces/Respair/Tsukasa_Speech

In your case, the simplest way to add a pause is to split after Rises.. Here is bm_george with a 1 second of silence added between the split:

You can reproduce this in Colab after loading the model & voicepack like this:

from kokoro import generate
import numpy as np
texts = ["Chapter 1: Batman Rises.", "Bale said: I'm Batman."]
wavs = []
for text in texts:
    audio, out_ps = generate(MODEL, text, VOICEPACK, lang=VOICE_NAME[0])
    if not wavs:
        wavs.append(np.zeros(24000)) # 24000 frames of zeros = 1 second of silence
    wavs.append(audio)

audio = np.concatenate(wavs)

from IPython.display import display, Audio
display(Audio(data=audio, rate=24000, autoplay=True))

You can decide how you want to split your text, right now the code here does not do it for you.

Some 3rd party projects integrating Kokoro that might have better splitting methods for long documents:

hexgrad changed discussion status to closed 10 days ago