This model is a finetuned whisper-small model with 500k audio samples from the dataset mitermix/audiosnippets