{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Multi-Accent and Multi-Lingual Voice Clone Demo with MeloTTS" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import torch\n", "from openvoice import se_extractor\n", "from openvoice.api import ToneColorConverter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Initialization\n", "\n", "In this example, we will use the checkpoints from OpenVoiceV2. OpenVoiceV2 is trained with more aggressive augmentations and thus demonstrate better robustness in some cases." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ckpt_converter = 'checkpoints_v2/converter'\n", "device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n", "output_dir = 'outputs_v2'\n", "\n", "tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)\n", "tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')\n", "\n", "os.makedirs(output_dir, exist_ok=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Obtain Tone Color Embedding\n", "We only extract the tone color embedding for the target speaker. The source tone color embeddings can be directly loaded from `checkpoints_v2/ses` folder." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "reference_speaker = 'resources/example_reference.mp3' # This is the voice you want to clone\n", "target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Use MeloTTS as Base Speakers\n", "\n", "MeloTTS is a high-quality multi-lingual text-to-speech library by @MyShell.ai, supporting languages including English (American, British, Indian, Australian, Default), Spanish, French, Chinese, Japanese, Korean. In the following example, we will use the models in MeloTTS as the base speakers. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from melo.api import TTS\n", "\n", "texts = {\n", " 'EN_NEWEST': \"Did you ever hear a folk tale about a giant turtle?\", # The newest English base speaker model\n", " 'EN': \"Did you ever hear a folk tale about a giant turtle?\",\n", " 'ES': \"El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante.\",\n", " 'FR': \"La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante.\",\n", " 'ZH': \"在这次vacation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。\",\n", " 'JP': \"彼は毎朝ジョギングをして体を健康に保っています。\",\n", " 'KR': \"안녕하세요! 오늘은 날씨가 정말 좋네요.\",\n", "}\n", "\n", "\n", "src_path = f'{output_dir}/tmp.wav'\n", "\n", "# Speed is adjustable\n", "speed = 1.0\n", "\n", "for language, text in texts.items():\n", " model = TTS(language=language, device=device)\n", " speaker_ids = model.hps.data.spk2id\n", " \n", " for speaker_key in speaker_ids.keys():\n", " speaker_id = speaker_ids[speaker_key]\n", " speaker_key = speaker_key.lower().replace('_', '-')\n", " \n", " source_se = torch.load(f'checkpoints_v2/base_speakers/ses/{speaker_key}.pth', map_location=device)\n", " model.tts_to_file(text, speaker_id, src_path, speed=speed)\n", " save_path = f'{output_dir}/output_v2_{speaker_key}.wav'\n", "\n", " # Run the tone color converter\n", " encode_message = \"@MyShell\"\n", " tone_color_converter.convert(\n", " audio_src_path=src_path, \n", " src_se=source_se, \n", " tgt_se=target_se, \n", " output_path=save_path,\n", " message=encode_message)" ] } ], "metadata": { "kernelspec": { "display_name": "melo", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 2 }