Zonos models give equal output

#28
by Pendrokar - opened

I have a feeling both setups are using the same transformer model. Would appreciate some help. I was not able to replicate the issue when manually using the HF Space.

@Steveeeeeeen
@multimodalart

I need someone to take a second look if I am missing something obvious in the Gradio client test script where it also occurs?
https://huggingface.co./spaces/Pendrokar/TTS-Spaces-Arena/blob/main/test_tts_zonos.py

print output:

$ python .\test_tts_zonos.py
Running in HF Space, syncing DB to HF dataset
low vote top_five: []
Loaded as API: https://steveeeeeeen-zonos.hf.space βœ”
print line 20: {'model_choice': 'Zyphra/Zonos-v0.1-transformer', 'text': 'Zonos uses eSpeak for text to phoneme conversion!', 'language': 'en-us', 'speaker_audio': 'None', 'prefix_audio': "{'path': '/tmp/gradio/332aca853976eb02246e7d036faef5425586de5c56872e45da7ab2077cfd2ab9/silence_100ms.wav', 'url': 'https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav', 'size': None, 'orig_name': 'silence_100ms.wav', 'mime_type': None, 'is_stream': False, 'meta': {'_type': 'gradio.FileData'}}", 'e1': 1.0, 'e2': 0.05, 'e3': 0.05, 'e4': 0.05, 'e5': 0.05, 'e6': 0.05, 'e7': 0.1, 'e8': 0.2, 'vq_single': 0.78, 'fmax': 24000.0, 'pitch_std': 45.0, 'speaking_rate': 15.0, 'dnsmos_ovrl': 4.0, 'speaker_noised': False, 'cfg_scale': 2.0, 'min_p': 0.15, 'seed': 420.0, 'randomize_seed': True, 'unconditional_keys': None}
Steveeeeeeen/Zonos: Default inputs overridden by Arena
print line 43:  {'model_choice': 'Zyphra/Zonos-v0.1-hybrid', 'text': 'This is what my voice sounds like.', 'language': 'en-us', 'speaker_audio': None, 'prefix_audio': {'path': 'https://huggingface.co./spaces/Steveeeeeeen/Zonos/resolve/main/assets/silence_100ms.wav', 'meta': {'_type': 'gradio.FileData'}, 'orig_name': 'silence_100ms.wav', 'url': 'https://huggingface.co./spaces/Steveeeeeeen/Zonos/resolve/main/assets/silence_100ms.wav'}, 'e1': 1.0, 'e2': 0.05, 'e3': 0.05, 'e4': 0.05, 'e5': 0.05, 'e6': 0.05, 'e7': 0.1, 'e8': 0.2, 'vq_single': 0.78, 'fmax': 24000, 'pitch_std': 45, 'speaking_rate': 15, 'dnsmos_ovrl': 4, 'speaker_noised': False, 'cfg_scale': 2, 'min_p': 
0.15, 'seed': 420, 'randomize_seed': False, 'unconditional_keys': ['emotion']}
('C:\\Users\...

Sign up or log in to comment