maskgct / pretrained /README.md
Hecheng0625's picture
Upload 409 files
c968fc3 verified

A newer version of the Gradio SDK is available: 5.12.0

Upgrade

Pretrained Models Dependency

The models dependency of Amphion are as follows (sort alphabetically):

The instructions about how to download them is displayed as follows.

Amphion Singing BigVGAN

We fine-tune the official BigVGAN pretrained model with over 120 hours singing voice data. The fine-tuned checkpoint can be downloaded here. You need to download the 400000.pt and args.json files into Amphion/pretrained/bigvgan:

Amphion
 ┣ pretrained
 ┃ ┣ bivgan
 ┃ ┃ ┣ 400000.pt
 ┃ ┃ ┣ args.json

Amphion Speech HiFi-GAN

We trained our HiFi-GAN pretrained model with 685 hours speech data. Which can be downloaded here. You need to download the whole folder of hifigan_speech into Amphion/pretrained/hifigan.

Amphion
 ┣ pretrained
 ┃ ┣ hifigan
 ┃ ┃ ┣ hifigan_speech
 ┃ ┃ ┃ ┣ log
 ┃ ┃ ┃ ┣ result
 ┃ ┃ ┃ ┣ checkpoint
 ┃ ┃ ┃ ┣ args.json

Amphion DiffWave

We trained our DiffWave pretrained model with 125 hours speech data and around 80 hours of singing voice data. Which can be downloaded here. You need to download the whole folder of diffwave into Amphion/pretrained/diffwave.

Amphion
 ┣ pretrained
 ┃ ┣ diffwave
 ┃ ┃ ┣ diffwave_speech
 ┃ ┃ ┃ ┣ samples
 ┃ ┃ ┃ ┣ checkpoint
 ┃ ┃ ┃ ┣ args.json

ContentVec

You can download the pretrained ContentVec model here. Note that we use the ContentVec_legacy-500 classes checkpoint. Assume that you download the checkpoint_best_legacy_500.pt into the Amphion/pretrained/contentvec.

Amphion
 ┣ pretrained
 ┃ ┣ contentvec
 ┃ ┃ ┣ checkpoint_best_legacy_500.pt

WeNet

You can download the pretrained WeNet model here. Take the wenetspeech pretrained checkpoint as an example, assume you download the wenetspeech_u2pp_conformer_exp.tar into the Amphion/pretrained/wenet. Unzip it and modify its configuration file as follows:

cd Amphion/pretrained/wenet

### Unzip the expt dir
tar -xvf wenetspeech_u2pp_conformer_exp.tar.gz

### Specify the updated path in train.yaml
cd 20220506_u2pp_conformer_exp
vim train.yaml
# TODO: Change the value of "cmvn_file" (Line 2) to the absolute path of the `global_cmvn` file. (Eg: [YourPath]/Amphion/pretrained/wenet/20220506_u2pp_conformer_exp/global_cmvn)

The final file struture tree is like:

Amphion
 ┣ pretrained
 ┃ ┣ wenet
 ┃ ┃ ┣ 20220506_u2pp_conformer_exp
 ┃ ┃ ┃ ┣ final.pt
 ┃ ┃ ┃ ┣ global_cmvn
 ┃ ┃ ┃ ┣ train.yaml
 ┃ ┃ ┃ ┣ units.txt

Whisper

The official pretrained whisper checkpoints can be available here. In Amphion, we use the medium whisper model by default. You can download it as follows:

cd Amphion/pretrained
mkdir whisper
cd whisper

wget https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt

The final file structure tree is like:

Amphion
 ┣ pretrained
 ┃ ┣ whisper
 ┃ ┃ ┣ medium.pt

RawNet3

The official pretrained RawNet3 checkpoints can be available here. You need to download the model.pt file and put it in the folder.

The final file structure tree is like:

Amphion
 ┣ pretrained
 ┃ ┣ rawnet3
 ┃ ┃ ┣ model.pt

(Optional) Model Dependencies for Evaluation

When utilizing Amphion's Evaluation Pipelines, terminals without access to huggingface.co may encounter error messages such as "OSError: Can't load tokenizer for ...". To work around this, the dependant models for evaluation can be pre-prepared and stored here, at Amphion/pretrained, and follow this README to configure your environment to load local models.

The dependant models of Amphion's evaluation pipeline are as follows (sort alphabetically):

The instructions about how to download them is displayed as follows.

bert-base-uncased

To load bert-base-uncased locally, follow this link to download all files for bert-base-uncased model, and store them under Amphion/pretrained/bert-base-uncased, conforming to the following file structure tree:

Amphion
 ┣ pretrained
 ┃ ┣ bert-base-uncased
 ┃ ┃ ┣ config.json
 ┃ ┃ ┣ coreml 
 ┃ ┃ ┃ ┣ fill-mask
 ┃ ┃ ┃   ┣ float32_model.mlpackage
 ┃ ┃ ┃      ┣ Data
 ┃ ┃ ┃         ┣ com.apple.CoreML
 ┃ ┃ ┃            ┣ model.mlmodel 
 ┃ ┃ ┣ flax_model.msgpack
 ┃ ┃ ┣ LICENSE
 ┃ ┃ ┣ model.onnx
 ┃ ┃ ┣ model.safetensors
 ┃ ┃ ┣ pytorch_model.bin
 ┃ ┃ ┣ README.md
 ┃ ┃ ┣ rust_model.ot
 ┃ ┃ ┣ tf_model.h5
 ┃ ┃ ┣ tokenizer_config.json
 ┃ ┃ ┣ tokenizer.json
 ┃ ┃ ┣ vocab.txt

facebook/bart-base

To load facebook/bart-base locally, follow this link to download all files for facebook/bart-base model, and store them under Amphion/pretrained/facebook/bart-base, conforming to the following file structure tree:

Amphion
 ┣ pretrained
 ┃ ┣ facebook
 ┃ ┃ ┣ bart-base
 ┃ ┃ ┃ ┣ config.json
 ┃ ┃ ┃ ┣ flax_model.msgpack
 ┃ ┃ ┃ ┣ gitattributes.txt
 ┃ ┃ ┃ ┣ merges.txt
 ┃ ┃ ┃ ┣ model.safetensors
 ┃ ┃ ┃ ┣ pytorch_model.bin
 ┃ ┃ ┃ ┣ README.txt
 ┃ ┃ ┃ ┣ rust_model.ot
 ┃ ┃ ┃ ┣ tf_model.h5
 ┃ ┃ ┃ ┣ tokenizer.json
 ┃ ┃ ┃ ┣ vocab.json

roberta-base

To load roberta-base locally, follow this link to download all files for roberta-base model, and store them under Amphion/pretrained/roberta-base, conforming to the following file structure tree:

Amphion
 ┣ pretrained
 ┃ ┣ roberta-base
 ┃ ┃ ┣ config.json
 ┃ ┃ ┣ dict.txt
 ┃ ┃ ┣ flax_model.msgpack
 ┃ ┃ ┣ gitattributes.txt
 ┃ ┃ ┣ merges.txt
 ┃ ┃ ┣ model.safetensors
 ┃ ┃ ┣ pytorch_model.bin
 ┃ ┃ ┣ README.txt
 ┃ ┃ ┣ rust_model.ot
 ┃ ┃ ┣ tf_model.h5
 ┃ ┃ ┣ tokenizer.json
 ┃ ┃ ┣ vocab.json

wavlm

The official pretrained wavlm checkpoints can be available here. The file structure tree is as follows:

Amphion
 ┣ wavlm
 ┃ ┣ config.json
 ┃ ┣ preprocessor_config.json
 ┃ ┣ pytorch_model.bin