Text-to-Speech
PyTorch
ONNX
Catalan
matcha-tts
acoustic modelling
speech
multispeaker
AlexK-PL commited on
Commit
d79d134
1 Parent(s): 0a1f685

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -32,21 +32,21 @@ datasets:
32
 
33
  ## Model description
34
 
35
- **Matcha-TTS** is an encoder-decoder architecture designed for fast acoustic modelling in TTS. The encoder predicts phoneme durations and its average acoustic features.
36
- And the decoder is essentially a U-Net inspired by [Grad-TTS](https://arxiv.org/pdf/2105.06337.pdf), that is based on Transformers architecture but combined
37
- with 1D instead of 2D CNNs, making a high reduction on memory consumption and speedy synthesis.
38
 
39
- **Matcha-TTS** is non-autorregressive model and is trained using optimal-transport conditional flow matching (OT-CFM).
40
- This yields an ODE-based decoder capable of high output quality in fewer synthesis steps than models trained using score matching.
41
 
42
  ## Intended uses and limitations
43
 
44
  This model is intended to serve as an acoustic feature generator for multispeaker text-to-speech systems for the Catalan language.
45
- It has been finetuned using a Catalan phonemizer, therefore if the model is used in other languages it may will not produce intelligible samples after converting its output
46
- into a speech waveform.
47
 
48
  The quality of the samples can vary depending on the speaker.
49
- This may be due to the sensitivity of the model in learning specific frequencies and also due to the samples used for each speaker.
50
 
51
  ## How to use
52
 
 
32
 
33
  ## Model description
34
 
35
+ **Matcha-TTS** is an encoder-decoder architecture designed for fast acoustic modelling in TTS. The encoder predicts phoneme durations and their averaged acoustic features.
36
+ The decoder backbone is essentially a U-Net inspired by [Grad-TTS](https://arxiv.org/pdf/2105.06337.pdf) based on Transformers architecture. By replacing 2D CNNs by 1D CNNs,
37
+ a large reduction in memory consumption and fast synthesis is achieved.
38
 
39
+ **Matcha-TTS** is a non-autorregressive model trained with optimal-transport conditional flow matching (OT-CFM).
40
+ This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps than models trained using score matching.
41
 
42
  ## Intended uses and limitations
43
 
44
  This model is intended to serve as an acoustic feature generator for multispeaker text-to-speech systems for the Catalan language.
45
+ It has been finetuned using a Catalan phonemizer, therefore if the model is used for other languages it may will not produce intelligible samples after mapping
46
+ its output into a speech waveform.
47
 
48
  The quality of the samples can vary depending on the speaker.
49
+ This may be due to the sensitivity of the model in learning specific frequencies and also due to the quality of samples for each speaker.
50
 
51
  ## How to use
52