BreezyVoice
Collection
Realistic Taiwan Mandarin Voice Cloning TTS
•
2 items
•
Updated
BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights
BreezyVoice is a voice-cloning text-to-speech system specifically adapted for Taiwanese Mandarin, highlighting phonetic control abilities via auxiliary 注音 (bopomofo) inputs. BreezyVoice is partially derived from CosyVoice
Running from the GitHub instruction automatically downloads the model for you
You can also run the model from a specified local path by cloning the model
git lfs install
git clone https://huggingface.co./MediaTek-Research/BreezyVoice-300M
then, you can use the model as specified in the run_inference.py script, providing the local model path using the model_path parameter.
If you like our work, please cite:
@article{hsu2025breezyvoice,
title={BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation--Challenges and Insights},
author={Hsu, Chan-Jan and Lin, Yi-Cheng and Lin, Chia-Chun and Chen, Wei-Chih and Chung, Ho Lam and Li, Chen-An and Chen, Yi-Chang and Yu, Chien-Yu and Lee, Ming-Ji and Chen, Chien-Cheng and others},
journal={arXiv preprint arXiv:2501.17790},
year={2025}
}