openvoice2

Running

File size: 4,726 Bytes

b72ab63

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Multi-Accent and Multi-Lingual Voice Clone Demo with MeloTTS"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import torch\n",
    "from openvoice import se_extractor\n",
    "from openvoice.api import ToneColorConverter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Initialization\n",
    "\n",
    "In this example, we will use the checkpoints from OpenVoiceV2. OpenVoiceV2 is trained with more aggressive augmentations and thus demonstrate better robustness in some cases."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ckpt_converter = 'checkpoints_v2/converter'\n",
    "device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n",
    "output_dir = 'outputs_v2'\n",
    "\n",
    "tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)\n",
    "tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')\n",
    "\n",
    "os.makedirs(output_dir, exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Obtain Tone Color Embedding\n",
    "We only extract the tone color embedding for the target speaker. The source tone color embeddings can be directly loaded from `checkpoints_v2/ses` folder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "reference_speaker = 'resources/example_reference.mp3' # This is the voice you want to clone\n",
    "target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Use MeloTTS as Base Speakers\n",
    "\n",
    "MeloTTS is a high-quality multi-lingual text-to-speech library by @MyShell.ai, supporting languages including English (American, British, Indian, Australian, Default), Spanish, French, Chinese, Japanese, Korean. In the following example, we will use the models in MeloTTS as the base speakers. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from melo.api import TTS\n",
    "\n",
    "texts = {\n",
    "    'EN_NEWEST': \"Did you ever hear a folk tale about a giant turtle?\",  # The newest English base speaker model\n",
    "    'EN': \"Did you ever hear a folk tale about a giant turtle?\",\n",
    "    'ES': \"El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante.\",\n",
    "    'FR': \"La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante.\",\n",
    "    'ZH': \"在这次vacation中，我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。\",\n",
    "    'JP': \"彼は毎朝ジョギングをして体を健康に保っています。\",\n",
    "    'KR': \"안녕하세요! 오늘은 날씨가 정말 좋네요.\",\n",
    "}\n",
    "\n",
    "\n",
    "src_path = f'{output_dir}/tmp.wav'\n",
    "\n",
    "# Speed is adjustable\n",
    "speed = 1.0\n",
    "\n",
    "for language, text in texts.items():\n",
    "    model = TTS(language=language, device=device)\n",
    "    speaker_ids = model.hps.data.spk2id\n",
    "    \n",
    "    for speaker_key in speaker_ids.keys():\n",
    "        speaker_id = speaker_ids[speaker_key]\n",
    "        speaker_key = speaker_key.lower().replace('_', '-')\n",
    "        \n",
    "        source_se = torch.load(f'checkpoints_v2/base_speakers/ses/{speaker_key}.pth', map_location=device)\n",
    "        model.tts_to_file(text, speaker_id, src_path, speed=speed)\n",
    "        save_path = f'{output_dir}/output_v2_{speaker_key}.wav'\n",
    "\n",
    "        # Run the tone color converter\n",
    "        encode_message = \"@MyShell\"\n",
    "        tone_color_converter.convert(\n",
    "            audio_src_path=src_path, \n",
    "            src_se=source_se, \n",
    "            tgt_se=target_se, \n",
    "            output_path=save_path,\n",
    "            message=encode_message)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "melo",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}