Amphion Singing Voice Conversion Pretrained Models

Quick Start

We provide a DiffWaveNetSVC pretrained checkpoint for you to play. Specially, it is trained under the real-world vocalist data (total duration: 6.16 hours), including the following 15 professional singers:

Singer Language Training Duration (mins)
David Tao ้™ถๅ–† Chinese 45.51
Eason Chan ้™ˆๅฅ•่ฟ… Chinese 43.36
Feng Wang ๆฑชๅณฐ Chinese 41.08
Jian Li ๆŽๅฅ Chinese 38.90
John Mayer English 30.83
Adele English 27.23
Ying Na ้‚ฃ่‹ฑ Chinese 27.02
Yijie Shi ็Ÿณๅ€šๆด Chinese 24.93
Jacky Cheung ๅผ ๅญฆๅ‹ Chinese 18.31
Taylor Swift English 18.31
Faye Wong ็Ž‹่ฒ English 16.78
Michael Jackson English 15.13
Tsai Chin ่”ก็ด Chinese 10.12
Bruno Mars English 6.29
Beyonce English 6.06

To make these singers sing the songs you want to listen to, just run the following commands:

Step1: Download the acoustics model checkpoint

git lfs install
git clone https://huggingface.co./amphion/singing_voice_conversion

Step2: Download the vocoder checkpoint

git clone https://huggingface.co./amphion/BigVGAN_singing_bigdata

Step3: Clone the Amphion's Source Code of GitHub

git clone https://github.com/open-mmlab/Amphion.git

Step4: Download ContentVec Checkpoint

You could download ContentVec Checkpoint from this repo. In this pretrained model, we used checkpoint_best_legacy_500.pt, which is the legacy ContentVec with 500 classes.

Step5: Specify the checkpoints' path

Use the soft link to specify the downloaded checkpoints:

cd Amphion
mkdir -p ckpts/svc
ln -s "$(realpath ../singing_voice_conversion/vocalist_l1_contentvec+whisper)" ckpts/svc/vocalist_l1_contentvec+whisper
ln -s "$(realpath ../BigVGAN_singing_bigdata/bigvgan_singing)" pretrained/bigvgan_singing

Also, you need to move checkpoint_best_legacy_500.pt you downloaded at Step4 into Amphion/pretrained/contentvec.

Step6: Conversion

You can follow this recipe to conduct the conversion. For example, if you want to make Taylor Swift sing the songs in the [Your Audios Folder], just run:

sh egs/svc/MultipleContentsSVC/run.sh --stage 3 --gpu "0" \
    --config "ckpts/svc/vocalist_l1_contentvec+whisper/args.json" \
    --infer_expt_dir "ckpts/svc/vocalist_l1_contentvec+whisper" \
    --infer_output_dir "ckpts/svc/vocalist_l1_contentvec+whisper/result" \
    --infer_source_audio_dir [Your Audios Folder] \
    --infer_vocoder_dir "pretrained/bigvgan_singing" \
    --infer_target_speaker "vocalist_l1_TaylorSwift" \
    --infer_key_shift "autoshift"

Note: The supported infer_target_speaker values can be seen here.

Citaions

@article{zhang2023leveraging,
  title={Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion},
  author={Zhang, Xueyao and Gu, Yicheng and Chen, Haopeng and Fang, Zihao and Zou, Lexiao and Xue, Liumeng and Wu, Zhizheng},
  journal={Machine Learning for Audio Worshop, NeurIPS 2023},
  year={2023}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Spaces using amphion/singing_voice_conversion 5