Text-to-Speech
coqui

Impossible to use the gradio API using python basic example

#29
by Sebogoss11 - opened

When running the most basic python code example :

CODE :

from gradio_client import Client

client = Client("https://coqui-xtts.hf.space/--replicas/29c56/")
result = client.predict(
"Howdy!", # str in 'Text Prompt' Textbox component
"en,en", # str (Option from: [('en', 'en'), ('es', 'es'), ('fr', 'fr'), ('de', 'de'), ('it', 'it'), ('pt', 'pt'), ('pl', 'pl'), ('tr', 'tr'), ('ru', 'ru'), ('nl', 'nl'), ('cs', 'cs'), ('ar', 'ar'), ('zh-cn', 'zh-cn'), ('ja', 'ja'), ('ko', 'ko'), ('hu', 'hu'), ('hi', 'hi')]) in 'Language' Dropdown component
"https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav", # str (filepath on your computer (or URL) of file) in 'Reference Audio' Audio component
"https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav", # str (filepath on your computer (or URL) of file) in 'Use Microphone for Reference' Audio component
True, # bool in 'Use Microphone' Checkbox component
True, # bool in 'Cleanup Reference Voice' Checkbox component
True, # bool in 'Do not use language auto-detect' Checkbox component
True, # bool in 'Agree' Checkbox component
fn_index=1
)
print(result)

OUTPUT :
Loaded as API: https://coqui-xtts.hf.space/--replicas/29c56/ βœ”
raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Redirect response '302 Found' for url 'https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav'
Redirect location: 'https://raw.githubusercontent.com/gradio-app/gradio/main/test/test_files/audio_sample.wav'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/302

So i change the links for the ones provided : "https://raw.githubusercontent.com/gradio-app/gradio/main/test/test_files/audio_sample.wav"

CODE :

from gradio_client import Client

client = Client("https://coqui-xtts.hf.space/--replicas/29c56/")
result = client.predict(
"Howdy!", # str in 'Text Prompt' Textbox component
"en,en", # str (Option from: [('en', 'en'), ('es', 'es'), ('fr', 'fr'), ('de', 'de'), ('it', 'it'), ('pt', 'pt'), ('pl', 'pl'), ('tr', 'tr'), ('ru', 'ru'), ('nl', 'nl'), ('cs', 'cs'), ('ar', 'ar'), ('zh-cn', 'zh-cn'), ('ja', 'ja'), ('ko', 'ko'), ('hu', 'hu'), ('hi', 'hi')]) in 'Language' Dropdown component
"https://raw.githubusercontent.com/gradio-app/gradio/main/test/test_files/audio_sample.wav", # str (filepath on your computer (or URL) of file) in 'Reference Audio' Audio component
"https://raw.githubusercontent.com/gradio-app/gradio/main/test/test_files/audio_sample.wav", # str (filepath on your computer (or URL) of file) in 'Use Microphone for Reference' Audio component
True, # bool in 'Use Microphone' Checkbox component
True, # bool in 'Cleanup Reference Voice' Checkbox component
True, # bool in 'Do not use language auto-detect' Checkbox component
True, # bool in 'Agree' Checkbox component
fn_index=1
)
print(result)

OUPUT :
Loaded as API: https://coqui-xtts.hf.space/--replicas/29c56/ βœ”
raise ValueError(f"Expected tuple of length 2. Received: {x}")
ValueError: Expected tuple of length 2. Received: None

Then i tried to some code found on the doc... It seems that the outputof client.predict is'nt a tuple as it is supposed to be ? client.predict crashes alone on the response...

CODE:

from gradio_client import Client

client = Client("https://coqui-xtts.hf.space/--replicas/29c56/")
client.view_api()

result = client.predict(
"Hello!", # str in 'Text Prompt' Textbox component
"en,en", # str (Option from: [('en', 'en'), ('es', 'es'), ('fr', 'fr'), ('de', 'de'), ('it', 'it'), ('pt', 'pt'), ('pl', 'pl'), ('tr', 'tr'), ('ru', 'ru'), ('nl', 'nl'), ('cs', 'cs'), ('ar', 'ar'), ('zh-cn', 'zh-cn'), ('ja', 'ja'), ('ko', 'ko'), ('hu', 'hu'), ('hi', 'hi')]) in 'Language' Dropdown component
"https://huggingface.co./spaces/coqui/xtts/blob/main/examples/male.wav",# str (filepath on your computer (or URL) of file) in 'Reference Audio' Audio component
"https://huggingface.co./spaces/coqui/xtts/blob/main/examples/male.wav",# str (filepath on your computer (or URL) of file) in 'Use Microphone for Reference' Audio component
False, # bool in 'Use Microphone' Checkbox component
True, # bool in 'Cleanup Reference Voice' Checkbox component
True, # bool in 'Do not use language auto-detect' Checkbox component
True, # bool in 'Agree' Checkbox component
fn_index=1,
)

print(result)

OUPUT :
Loaded as API: https://coqui-xtts.hf.space/--replicas/29c56/ βœ”
Client.predict() Usage Info

Named API endpoints: 0

Unnamed API endpoints: 1

  • predict(text_prompt, language, reference_audio, use_microphone_for_reference, use_microphone, cleanup_reference_voice, do_not_use_language_autodetect, agree, fn_index=1) -> (waveform_visual, synthesised_audio, metrics, reference_audio_used)
    Parameters:
    • [Textbox] text_prompt: str
    • [Dropdown] language: str (Option from: [('en', 'en'), ('es', 'es'), ('fr', 'fr'), ('de', 'de'), ('it', 'it'), ('pt', 'pt'), ('pl', 'pl'), ('tr', 'tr'), ('ru', 'ru'), ('nl', 'nl'), ('cs', 'cs'), ('ar', 'ar'), ('zh-cn', 'zh-cn'), ('ja', 'ja'), ('ko', 'ko'), ('hu', 'hu'), ('hi', 'hi')])
    • [Audio] reference_audio: str | Dict(name: str (name of file), data: str (base64 representation of file), size: int (size of image in bytes), is_file: bool (true if the file has been uploaded to the server), orig_name: str (original name of the file))
    • [Audio] use_microphone_for_reference: str | Dict(name: str (name of file), data: str (base64 representation of file), size: int (size of image in bytes), is_file: bool (true if the file has been uploaded to the server), orig_name: str (original name of the file))
    • [Checkbox] use_microphone: bool
    • [Checkbox] cleanup_reference_voice: bool
    • [Checkbox] do_not_use_language_autodetect: bool
    • [Checkbox] agree: bool
      Returns:
    • [Video] waveform_visual: str | Dict(name: str (name of file), data: str (base64 representation of file), size: int (size of image in bytes), is_file: bool (true if the file has been uploaded to the server), orig_name: str (original name of the file)) | List[str | Dict(name: str (name of file), data: str (base64 representation of file), size: int (size of image in bytes), is_file: bool (true if the file has been uploaded to the server), orig_name: str (original name of the file))]
    • [Audio] synthesised_audio: str | Dict(name: str (name of file), data: str (base64 representation of file), size: int (size of image in bytes), is_file: bool (true if the file has been uploaded to the server), orig_name: str (original name of the file))
    • [Textbox] metrics: str
    • [Audio] reference_audio_used: str | Dict(name: str (name of file), data: str (base64 representation of file), size: int (size of image in bytes), is_file: bool (true if the file has been uploaded to the server), orig_name: str (original name of the file))

Traceback (most recent call last):
_predict
raise ValueError(result["error"])
ValueError: None

I find it very disapointing to not be able to run the mode through the api. Plus, they're is absolutely not help whatsoever on the internet on that error from the gradio API. For the record, i also tried with my own duplicate space of the XTTS-v2 (with token specified) but it doesn't work either (same errors).
What's wrong ? Can you update the huggingface space code to make the python basic code example work please ?

Sign up or log in to comment