Audio Course documentation

Supplemental reading and resources

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Supplemental reading and resources

This Unit pieced together many components from previous units, introducing the tasks of speech-to-speech translation, voice assistants and speaker diarization. The supplemental reading material is thus split into these three new tasks for your convenience:

Speech-to-speech translation:

Voice Assistant:

Meeting Transcriptions:

  • pyannote.audio Technical Report by Hervé Bredin: this report describes the main principles behind the pyannote.audio speaker diarization pipeline
  • Whisper X by Max Bain et al.: a superior approach to computing word-level timestamps using the Whisper model