ONNX
File size: 1,468 Bytes
d5b702e
 
 
4533019
 
fd6620c
05f13c3
 
 
36da640
4533019
 
98ee0dd
4533019
 
 
 
 
 
 
 
 
 
 
 
6ce6cd5
 
 
 
 
 
 
d5b702e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
license: apache-2.0
---
# BreezyVoice

[Playground](https://www.kaggle.com/code/a24998667/breezyvoice-playground); [GitHub](https://github.com/Splend1d/BreezyVoice); [Paper](https://arxiv.org/abs/2501.17790)

**BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights**	

BreezyVoice is a voice-cloning text-to-speech system specifically adapted for Taiwanese Mandarin, highlighting phonetic control abilities via auxiliary 注音 (bopomofo) inputs. BreezyVoice is partially derived from [CosyVoice](https://github.com/FunAudioLLM/CosyVoice)


## How to Run

**Running from the GitHub instruction automatically downloads the model for you**

You can also run the model from a specified local path by cloning the model
```
git lfs install
git clone https://huggingface.co./MediaTek-Research/BreezyVoice-300M
```
then, you can use the model as specified in the run_inference.py script, providing the local model path using the model_path parameter.

If you like our work, please cite:

```
@article{hsu2025breezyvoice,
  title={BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation--Challenges and Insights},
  author={Hsu, Chan-Jan and Lin, Yi-Cheng and Lin, Chia-Chun and Chen, Wei-Chih and Chung, Ho Lam and Li, Chen-An and Chen, Yi-Chang and Yu, Chien-Yu and Lee, Ming-Ji and Chen, Chien-Cheng and others},
  journal={arXiv preprint arXiv:2501.17790},
  year={2025}
}
```