datasets: | |
- simon3000/genshin-voice | |
- CSTR-Edinburgh/vctk | |
language: | |
- en | |
# So-Vits-Svc Base Model V1 | |
The base model to generate new voices with so-vits-svc voice lab. | |
The dataset was comprised of 278 english speaking people. | |
4 datasets where used: | |
- Genshin Voice: Only speakers with more than 30min of audio | |
- VCTK | |
- Vocalset | |
- Private scraped dataset | |
The model was trained for around 4 days and 16 hours on a single rtx 3090 (61 epochs / 430k steps) | |