Text-to-Speech
PyTorch
ONNX
Catalan
matcha-tts
acoustic modelling
speech
multispeaker
AlexK-PL commited on
Commit
e69747c
1 Parent(s): 6a5a7ed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -8
README.md CHANGED
@@ -4,11 +4,10 @@ language:
4
  licence:
5
  - apache-2.0
6
  tags:
7
- - matcha tts
 
8
  - speech
9
- - text-to-speech
10
  - multispeaker
11
- - catalan
12
  pipeline_tag: text-to-speech
13
  datasets:
14
  - projecte-aina/festcat_trimmed_denoised
@@ -33,11 +32,11 @@ datasets:
33
 
34
  ## Model description
35
 
36
- Matcha-TTS is an encoder-decoder architecture designed for fast acoustic modelling in TTS. The encoder predicts phoneme durations and its mean feature vectors.
37
  And the decoder is essentially a U-Net inspired by Grad-TTS, that is based on Transformers architecture combined
38
  with 1D instead of 2D CNNs, making a high reduction on memory consumption and speedy synthesis.
39
 
40
- Matcha-TTS is non-autorregressive and is trained using optimal-transport conditional flow matching (OT-CFM).
41
  This yields an ODE-based decoder capable of high output quality in fewer synthesis steps than models trained using score matching.
42
 
43
  ## Intended uses and limitations
@@ -70,10 +69,13 @@ print(f"Result: {generation[0]['generated_text']}")
70
  ```
71
 
72
  ## Limitations and bias
73
- At the time of submission, no measures have been taken to estimate the bias and toxicity embedded in the model.
74
- However, we are well aware that our models may be biased since the corpora have been collected using crawling techniques
75
- on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated.
76
 
 
 
 
 
 
 
77
 
78
  ## Training
79
 
 
4
  licence:
5
  - apache-2.0
6
  tags:
7
+ - matcha-tts
8
+ - acoustic modelling
9
  - speech
 
10
  - multispeaker
 
11
  pipeline_tag: text-to-speech
12
  datasets:
13
  - projecte-aina/festcat_trimmed_denoised
 
32
 
33
  ## Model description
34
 
35
+ **Matcha-TTS** is an encoder-decoder architecture designed for fast acoustic modelling in TTS. The encoder predicts phoneme durations and its mean feature vectors.
36
  And the decoder is essentially a U-Net inspired by Grad-TTS, that is based on Transformers architecture combined
37
  with 1D instead of 2D CNNs, making a high reduction on memory consumption and speedy synthesis.
38
 
39
+ **Matcha-TTS** is non-autorregressive and is trained using optimal-transport conditional flow matching (OT-CFM).
40
  This yields an ODE-based decoder capable of high output quality in fewer synthesis steps than models trained using score matching.
41
 
42
  ## Intended uses and limitations
 
69
  ```
70
 
71
  ## Limitations and bias
 
 
 
72
 
73
+ This model is intended to serve as an acoustic feature generator for multispeaker text-to-speech systems for the Catalan language.
74
+ It has been finetuned using a Catalan phonemizer, therefore if the model is used in other languages it may will not produce intelligible samples after converting its output
75
+ into a speech waveform.
76
+
77
+ The quality of the samples may vary depending on the speaker. This is due to the sensitivity of the model in learning specific frequencies and also in the samples
78
+ used for each speaker.
79
 
80
  ## Training
81