--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - 'no' - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su tags: - audio - automatic-speech-recognition - hf-asr-leaderboard widget: - example_title: Librispeech sample 1 src: https://cdn-media.huggingface.co/speech_samples/sample1.flac - example_title: Librispeech sample 2 src: https://cdn-media.huggingface.co/speech_samples/sample2.flac pipeline_tag: automatic-speech-recognition license: apache-2.0 datasets: - ivrit-ai/whisper-training --- # NOTE: THIS IS A CT-2 (Faster-Whisper) version of the model the original model can be found [here](https://huggingface.co./ivrit-ai/whisper-large-v2-tuned) # Whisper Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. More details about it are available [here](https://huggingface.co./openai/whisper-large-v2). **whisper-large-v2-tuned** is a version of whisper-large-v2, fine-tuned by [ivrit.ai](https://www.ivrit.ai) to improve Hebrew ASR using crowd-sourced labeling. ## Model details This model comes as a single checkpoint, whisper-large-v2-tuned. It is a 1550M parameters multi-lingual ASR solution. # Usage ```python from faster_whisper import WhisperModel model = WhisperModel("sivan22/faster-whisper-ivrit-ai-whisper-large-v2-tuned") segments, info = model.transcribe("audio.mp3") for segment in segments: print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) ``` ## Evaluation You can use the [evaluate_model.py](https://github.com/yairl/ivrit.ai/blob/master/evaluate_model.py) reference on GitHub to evalute the model's quality. ### BibTeX entry and citation info **ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development** ```bibtex @misc{marmor2023ivritai, title={ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development}, author={Yanir Marmor and Kinneret Misgav and Yair Lifshitz}, year={2023}, eprint={2307.08720}, archivePrefix={arXiv}, primaryClass={eess.AS} } ``` **Whisper: Robust Speech Recognition via Large-Scale Weak Supervision** ```bibtex @misc{radford2022whisper, doi = {10.48550/ARXIV.2212.04356}, url = {https://arxiv.org/abs/2212.04356}, author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya}, title = {Robust Speech Recognition via Large-Scale Weak Supervision}, publisher = {arXiv}, year = {2022}, copyright = {arXiv.org perpetual, non-exclusive license} } ```