waveletdeboshir
/

whisper-base-ru-pruned-ft

+---
+license: apache-2.0
+language:
+- ru
+library_name: transformers
+pipeline_tag: automatic-speech-recognition
+base_model: waveletdeboshir/whisper-base-ru-pruned-finetuned
+tags:
+- asr
+- Pytorch
+- pruned
+- finetune
+- audio
+- automatic-speech-recognition
+model-index:
+- name: Whisper Base Pruned and Finetuned for Russian
+  results:
+  - task:
+      name: Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: Common Voice 15.0 (Russian part, test)
+      type: mozilla-foundation/common_voice_15_0
+      args: ru
+    metrics:
+    - name: WER
+      type: wer
+      value: null
+  - task:
+      name: Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: Common Voice 15.0 (Russian part, test)
+      type: mozilla-foundation/common_voice_15_0
+      args: ru
+    metrics:
+    - name: WER (without punctuation)
+      type: wer
+      value: null
+datasets:
+- mozilla-foundation/common_voice_15_0
+---
+# Whisper-base-ru-pruned-finetuned
+## Model info
+This is a finetuned version of pruned whisper-base model ([waveletdeboshir/whisper-base-ru-pruned](https://huggingface.co/waveletdeboshir/whisper-base-ru-pruned)) for Russian language.
+Model was finetuned on russian part of [mozilla-foundation/common_voice_15_0](https://huggingface.co/datasets/mozilla-foundation/common_voice_15_0).
+## Metrics
+| metric | dataset | waveletdeboshir/whisper-base-ru-pruned | waveletdeboshir/whisper-small-ru-pruned-finetuned |
+| :------ | :------ | :------ | :------ |
+| WER* | common_voice_15_0_test |  |  |
+| WER | common_voice_15_0_test |  |  |
+*Metrics were computed after text normalization
+## Size
+Only 10% tokens was left including special whisper tokens (no language tokens except \<|ru|\> and \<|en|\>, no timestamp tokens), 200 most popular tokens from tokenizer and 4000 most popular Russian tokens computed by tokenization of russian text corpus.
+Model size is 30%  less then original whisper-base:
+|  | openai/whisper-base | waveletdeboshir/whisper-base-ru-pruned-finetuned |
+| :------ | :------ | :------ |
+| n of parameters | 74 M | 48 M |
+| n of parameters (with proj_out layer) | 99 M | 50 M |
+| model file size | 290 Mb | 201 Mb |
+| vocab_size | 51865 | 4207 |
+## Usage
+Model can be used as an original whisper:
+```python
+>>> from transformers import WhisperProcessor, WhisperForConditionalGeneration
+>>> import torchaudio
+>>> # load audio
+>>> wav, sr = torchaudio.load("audio.wav")
+>>> # load model and processor
+>>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-base-ru-pruned-finetuned")
+>>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-base-ru-pruned-finetuned")
+>>> input_features = processor(wav[0], sampling_rate=sr, return_tensors="pt").input_features
+>>> # generate token ids
+>>> predicted_ids = model.generate(input_features)
+>>> # decode token ids to text
+>>> transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
+['<|startoftranscript|><|ru|><|transcribe|><|notimestamps|> Начинаем работу.<|endoftext|>']
+```
+The context tokens can be removed from the start of the transcription by setting `skip_special_tokens=True`.
+## Other pruned whisper models
+* [waveletdeboshir/whisper-tiny-ru-pruned](https://huggingface.co/waveletdeboshir/whisper-tiny-ru-pruned)
+* [waveletdeboshir/whisper-small-ru-pruned](https://huggingface.co/waveletdeboshir/whisper-small-ru-pruned)