File size: 1,853 Bytes
ae3709c
460e46a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fca422e
 
460e46a
 
ae3709c
460e46a
20c3e6b
 
bb7f6e4
34dbd80
 
460e46a
 
 
 
 
 
 
 
1f81817
fca422e
1f81817
 
 
20c3e6b
 
 
 
 
460e46a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
language:
  - en
  - de
  - es
  - it
  - nl
  - pt
  - pl
  - ro
  - sv
  - da
  - fi
  - hu
  - el
  - fr
  - ru
  - uk
  - tr
  - ar
  - hi
  - jp
  - ko
  - zh
  - vi
  - la
  - ha
  - sw
  - yo
  - wo
library: xvasynth
tags:
  - emotion
  - audio
  - text-to-speech
  - speech-to-speech
  - voice conversion 
  - tts
pipeline_tag: text-to-speech
---

GitHub project: https://github.com/DanRuta/xVA-Synth

The base model for training other xVASynth's "xVAPitch" type models (v3). Model itself is used by the xVATrainer TTS model training app and not for inference. All created by Dan "@dr00392" Ruta.

`The v3 model now uses a slightly custom tweaked VITS/YourTTS model. Tweaks including larger capacity, bigger lang embedding, custom symbol set (a custom spec of ARPAbet with some more phonemes to cover other languages), and I guess a different training script.` - Dan Ruta

When used in xVASynth editor, it is an American Adult Male voice. Default pacing is too fast and has to be adjusted.

xVAPitch_5820651 model sample: <audio controls>
  <source src="https://huggingface.co./Pendrokar/xvapitch/resolve/main/xVAPitch_5820651.wav?download=true" type="audio/wav">
  Your browser does not support the audio element.
</audio>

Papers:
- VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech - https://arxiv.org/abs/2106.06103
- YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone - https://arxiv.org/abs/2112.02418

Referenced papers within code:
- Multi-head attention with Relative Positional embedding - https://arxiv.org/pdf/1809.04281.pdf
- Transformer with Relative Potional Encoding- https://arxiv.org/abs/1803.02155
- SDP - https://arxiv.org/pdf/2106.06103.pdf
- Spline Flow - https://arxiv.org/abs/1906.04032

Used datasets: Unknown/Non-permissiable data