Supplemental reading and resources
This unit introduced the text-to-speech task, and covered a lot of ground. Want to learn more? Here you will find additional resources that will help you deepen your understanding of the topics and enhance your learning experience.
- HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis: a paper introducing HiFi-GAN for speech synthesis.
- X-Vectors: Robust DNN Embeddings For Speaker Recognition: a paper introducing X-Vector method for speaker embeddings.
- FastSpeech 2: Fast and High-Quality End-to-End Text to Speech: a paper introducing FastSpeech 2, another popular text-to-speech model that uses a non-autoregressive TTS method.
- A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech: a paper introducing MQTTS, an autoregressive TTS system that replaces mel-spectrograms with quantized discrete representation.