does it support emotion voice in multispeech for hindi

by bumkailashkumar - opened Dec 16, 2024

Dec 16, 2024

does it support emotion voice in multispeech for hindi voice like

अर्जुन (उत्साहित): वाह! ये क्या है? एक नक्शा? और कितना पुराना है!
अर्जुन (उत्सुक): ये तो खज़ाने का नक्शा है! एक असली, सच्चा खज़ाने का नक्शा!
अर्जुन (सोचते हुए): पुराना ओक का पेड़! वो जगह तो मुझे याद है, जहाँ मैं अक्सर अपने दादाजी के साथ जाता था।
अर्जुन (सोचते हुए): गुप्त रास्ता कहाँ है?
अर्जुन (उत्साहित): एक गुप्त रास्ता!
अर्जुन (चौंककर): वाह! ये तो खज़ाने का संदूक है!
अर्जुन (मुस्कुराते हुए): ये सोने और रत्नों से भी बेहतर है। ये मेरे दादाजी का तोहफा है।

{Excited} वाह! ये क्या है? एक नक्शा? और कितना पुराना है!
{Curious} ये तो खज़ाने का नक्शा है! एक असली, सच्चा खज़ाने का नक्शा!
{Thinking} पुराना ओक का पेड़! वो जगह तो मुझे याद है, जहाँ मैं अक्सर अपने दादाजी के साथ जाता था।
{Thinking} गुप्त रास्ता कहाँ है?
{Excited} एक गुप्त रास्ता!
{Surprised} वाह! ये तो खज़ाने का संदूक है!
{Happy} ये सोने और रत्नों से भी बेहतर है। ये मेरे दादाजी का तोहफा है।

rumourscape

SPRINGLab org Dec 17, 2024

My model is not that good with emotions due to the limited data and the nature of the data. But you can finetune it for that purpose if you have the right data.

bumkailashkumar

Dec 17, 2024

any model that you know already does for hindi

rumourscape

SPRINGLab org Dec 17, 2024

Unfortunately, there is no such model available for Hindi as far as I know.

bk9985

Dec 20, 2024

how to use it in comfyui ? iss there any example sample (.wav, and txt )

rumourscape

SPRINGLab org Dec 20, 2024

There is a community version of comfyui that supports F5 & my model too.
Check here: https://github.com/niknah/ComfyUI-F5-TTS
Issue https://github.com/niknah/ComfyUI-F5-TTS/issues/15

AbhishekTiwariAKT

16 days ago

•

edited 16 days ago

Hey
I am getting noisy output using the hindi model?
what might i be doing wrong?
The command i am using
"""python /workspace/F5-TTS/src/f5_tts/infer/infer_cli.py --model "F5-TTS-small" --ckpt_file "/workspace/F5-Hindi-24KHz/model_2500000.safetensors" --vocab_file "/workspace/F5-Hindi-24KHz/vocab.txt" --ref_audio "/workspace/F5-Hindi-24KHz/samples/dear_friends_cleaned_1001.wav" --ref_text "अपने पीछे खड़े एक आदमी को इशारा किया तो वो आदमी खींचते हुए नवीन जोशी को वहां से बाहर ले गया । तब तक थप्पड़ की आवाज सुनकर कालिंदी और" --gen_text "अपने पीछे खड़े एक आदमी को इशारा किया तो वो आदमी खींचते हुए नवीन जोशी को वहां से बाहर ले गया । तब तक थप्पड़ की आवाज सुनकर कालिंदी और" """

generated output:

rumourscape

SPRINGLab org 16 days ago

Make sure the ref audio has no stutters or very long pauses. Also I assume you are not using convert_char_to_pinyin function.
You can verify if your reference audio is not a problem using my hosted demo here: https://asr.iitm.ac.in/demo-tts/

rumourscape changed discussion status to closed 16 days ago

AbhishekTiwariAKT

15 days ago

yes there is no issue with the reference audio as it is giving correct output in the demo.
Can you guide where to change convert_char_to_pinyin function in the repo?

rumourscape

SPRINGLab org 15 days ago

Can you guide where to change convert_char_to_pinyin function in the repo?

If you are using my fork of the F5 code then check if indic=True is set.

AbhishekTiwariAKT

15 days ago

Thanks for helping!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment