For those seeking automatic voice transcription and speaker prediction services, this demo will take your .wav file, transcribe the audio and predict who spoke it based on user guidance.

Often, those conducting interviews or meetings on the fly cannot always have meetings over video to record. 
In such cases, the option is to record on your phone and transcribe/edit pooling multiple sources.

Let's make it easier on us by...


- Step 1: Uploading a .wav file or recording directly in-app. View notes below on best practices.
- Step 2: Including the number of participants in the meeting (minimum 2, maximum 5).
- Step 3: Verifying the names of individuals.
- Step 4: Retrieving cleaned transcript with names!

Notes:

- This application will make life easier if meeting participants explicitly state their name upfront, which is an important privacy consideration when conducting meetings to imply consent.
  - "Begin by stating: Hi My name is YYY, and I work for ZZZZ..."
- This demo uses [Pyannote](https://github.com/pyannote/pyannote-audio) for audio segmentation and OpenAI's [Whisper Base](https://github.com/openai/whisper) for Automatic speech recognition.
- As a demo on ZeroGPU HuggingFace Spaces, so please keep the meeting length to a maximum of 30-40 minutes.