Only bad generations - what settings to use?

#21
by andykaufseo - opened

Out of 10 generations, 9 are bad, and 1 is not so bad (meaning that out of 30 seconds of audio, about 20s are ok).

My audio gets random pauses, gibberish words, noise.

Not sure how to even use this thing, not sure what the settings do:

  • fmax
  • vq score
  • cfg (apparently can't set it to 1)
  • max p

Tried only the gradio version (with and without voice cloning, same thing, i get 3-5 second gaps in between words)

Any help is appreciated.

Don't adjust any settings. They should be good right where they are. Also what is your setup????? OS, hardware? what text are you inputting? Id say first just click generate with out adjusting anything. It should work out of the box, depending what your setup is. Recommended Ubuntu on bare metal, or WSL , or windows, + Nvidia Graphics card. Plus all the dependencies.

I have the exact same problem, and i already had the same problem with coqui XTTS, as much as i thought that open source TTS was currently crap, but then i tried the official zonos playground and realized it was only coming from my side.

I changed no settings and just put a speaker voice, but even without it, its mostly gibberish.
I used L4 GPU, with ubuntu. And i also wonder why can this model partially fail : the script is not reporting any error and still output audio, although this output is still mostly gibberish.

I'm baffled how much these systems are sensitive.

Sign up or log in to comment