File size: 4,726 Bytes
b72ab63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Multi-Accent and Multi-Lingual Voice Clone Demo with MeloTTS"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import torch\n",
    "from openvoice import se_extractor\n",
    "from openvoice.api import ToneColorConverter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Initialization\n",
    "\n",
    "In this example, we will use the checkpoints from OpenVoiceV2. OpenVoiceV2 is trained with more aggressive augmentations and thus demonstrate better robustness in some cases."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ckpt_converter = 'checkpoints_v2/converter'\n",
    "device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n",
    "output_dir = 'outputs_v2'\n",
    "\n",
    "tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)\n",
    "tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')\n",
    "\n",
    "os.makedirs(output_dir, exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Obtain Tone Color Embedding\n",
    "We only extract the tone color embedding for the target speaker. The source tone color embeddings can be directly loaded from `checkpoints_v2/ses` folder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "reference_speaker = 'resources/example_reference.mp3' # This is the voice you want to clone\n",
    "target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Use MeloTTS as Base Speakers\n",
    "\n",
    "MeloTTS is a high-quality multi-lingual text-to-speech library by @MyShell.ai, supporting languages including English (American, British, Indian, Australian, Default), Spanish, French, Chinese, Japanese, Korean. In the following example, we will use the models in MeloTTS as the base speakers. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from melo.api import TTS\n",
    "\n",
    "texts = {\n",
    "    'EN_NEWEST': \"Did you ever hear a folk tale about a giant turtle?\",  # The newest English base speaker model\n",
    "    'EN': \"Did you ever hear a folk tale about a giant turtle?\",\n",
    "    'ES': \"El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante.\",\n",
    "    'FR': \"La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante.\",\n",
    "    'ZH': \"在这次vacation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。\",\n",
    "    'JP': \"彼は毎朝ジョギングをして体を健康に保っています。\",\n",
    "    'KR': \"안녕하세요! 오늘은 날씨가 정말 좋네요.\",\n",
    "}\n",
    "\n",
    "\n",
    "src_path = f'{output_dir}/tmp.wav'\n",
    "\n",
    "# Speed is adjustable\n",
    "speed = 1.0\n",
    "\n",
    "for language, text in texts.items():\n",
    "    model = TTS(language=language, device=device)\n",
    "    speaker_ids = model.hps.data.spk2id\n",
    "    \n",
    "    for speaker_key in speaker_ids.keys():\n",
    "        speaker_id = speaker_ids[speaker_key]\n",
    "        speaker_key = speaker_key.lower().replace('_', '-')\n",
    "        \n",
    "        source_se = torch.load(f'checkpoints_v2/base_speakers/ses/{speaker_key}.pth', map_location=device)\n",
    "        model.tts_to_file(text, speaker_id, src_path, speed=speed)\n",
    "        save_path = f'{output_dir}/output_v2_{speaker_key}.wav'\n",
    "\n",
    "        # Run the tone color converter\n",
    "        encode_message = \"@MyShell\"\n",
    "        tone_color_converter.convert(\n",
    "            audio_src_path=src_path, \n",
    "            src_se=source_se, \n",
    "            tgt_se=target_se, \n",
    "            output_path=save_path,\n",
    "            message=encode_message)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "melo",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}