|
--- |
|
license: cc-by-nc-4.0 |
|
language: |
|
- en |
|
--- |
|
Models trained from [VITS-fast-fine-tuning](https://github.com/Plachtaa/VITS-fast-fine-tuning) |
|
- Three speakers: laoliang (θζ’), specialweek, zhongli. |
|
- The model is based on the C+J base model and trained on a single NVIDIA 3090 with 300 epochs. It takes about 4.5 hours in total. |
|
- During training, we use a single long audio of laoliang (~5 minutes) with auxiliary data as training data. |
|
|
|
How to run the model? |
|
- Follow [the official instruction](https://github.com/Plachtaa/VITS-fast-fine-tuning/blob/main/LOCAL.md), install required libraries. |
|
- Download models and move _finetune_speaker.json_ and _G_latest.pth_ to _/path/to/ VITS-fast-fine-tuning_. |
|
- Run _python VC_inference.py --model_dir ./G_latest.pth --share True_ to start a local gradio inference demo. |
|
|
|
File structure |
|
```bash |
|
VITS-fast-fine-tuning |
|
ββββVC_inference.py |
|
ββββ... |
|
ββββfinetune_speaker.json |
|
ββββG_latest.pth |
|
``` |