Spaces:

TTS-AGI
/

TTS-Arena

Running on CPU Upgrade

How does TTS Arena work across different models?

#80

by zhihaodu - opened 4 days ago

4 days ago

As far as I know, the models inside are not the same; some are fine-tuned (SFT) for a single speaker, like ElevenLabs, Play.HT 2.0, while others are zero-shot, like GPT-SoVITS, CosyVoice 2.0. How is it ensured that these models are compared fairly?

Pendrokar

4 days ago

•

edited 4 days ago

This information is hidden within the private TTS router space. The dev can reveal some information about a model. But you have to ask directly.

TTS model authors can ask for changes. And may have access to the parameters through being a member of TTS-AGI. How does one become a member? It has been a year long mystery.

In the forked TTS Arena, I do not change the default parameters of the TTS Spaces. I don't think it is my task to find out the best way for a TTS to perform. It is the duty of the HF Space authors to do so. Same with F5 TTS Space, which has no default voice.

Pendrokar

4 days ago

some are fine-tuned (SFT) for a single speaker, like ElevenLabs, Play.HT 2.0, while others are zero-shot, like GPT-SoVITS, CosyVoice 2.0.

Also you can use this table to get info on which model is Zero-Shot (Insta-clone column)
https://huggingface.co./datasets/Pendrokar/open_tts_tracker

zhihaodu

3 days ago

This information is hidden within the private TTS router space. The dev can reveal some information about a model. But you have to ask directly.

TTS model authors can ask for changes. And may have access to the parameters through being a member of TTS-AGI. How does one become a member? It has been a year long mystery.

In the forked TTS Arena, I do not change the default parameters of the TTS Spaces. I don't think it is my task to find out the best way for a TTS to perform. It is the duty of the HF Space authors to do so. Same with F5 TTS Space, which has no default voice.

I see. Thanks for your comment. It's really helpful.

mrfakename

TTS AGI org 3 days ago

Hey @zhihaodu ,
Sorry about the delay. What @Pendrokar said about the router Space is correct - we use a private endpoint to proxy requests to models to protect our API keys and route them to privately-hosted instances of the model. We aim to use the default settings for models when possible. For zero-shot models, if the authors do not recommended a voice, we will attempt to find a well-performing reference voice sample for the model and use that.
Please let me know if you have any other questions!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment