waveletdeboshir commited on
Commit
3f171d3
1 Parent(s): dd57f71

Add metrics

Browse files
Files changed (1) hide show
  1. README.md +7 -5
README.md CHANGED
@@ -25,7 +25,7 @@ model-index:
25
  metrics:
26
  - name: WER
27
  type: wer
28
- value: null
29
  - task:
30
  name: Speech Recognition
31
  type: automatic-speech-recognition
@@ -36,7 +36,7 @@ model-index:
36
  metrics:
37
  - name: WER (without punctuation)
38
  type: wer
39
- value: null
40
  datasets:
41
  - mozilla-foundation/common_voice_15_0
42
  ---
@@ -50,11 +50,13 @@ Model was finetuned on russian part of [mozilla-foundation/common_voice_15_0](ht
50
 
51
  ## Metrics
52
 
53
- | metric | dataset | waveletdeboshir/whisper-base-ru-pruned | waveletdeboshir/whisper-small-ru-pruned-ft |
54
  | :------ | :------ | :------ | :------ |
55
- | WER (without punctuation) | common_voice_15_0_test | | |
56
- | WER | common_voice_15_0_test | | |
57
 
 
 
58
 
59
  ## Size
60
  Only 10% tokens was left including special whisper tokens (no language tokens except \<|ru|\> and \<|en|\>, no timestamp tokens), 200 most popular tokens from tokenizer and 4000 most popular Russian tokens computed by tokenization of russian text corpus.
 
25
  metrics:
26
  - name: WER
27
  type: wer
28
+ value: 26.52
29
  - task:
30
  name: Speech Recognition
31
  type: automatic-speech-recognition
 
36
  metrics:
37
  - name: WER (without punctuation)
38
  type: wer
39
+ value: 21.35
40
  datasets:
41
  - mozilla-foundation/common_voice_15_0
42
  ---
 
50
 
51
  ## Metrics
52
 
53
+ | metric | dataset | waveletdeboshir/whisper-base-ru-pruned | waveletdeboshir/whisper-base-ru-pruned-ft |
54
  | :------ | :------ | :------ | :------ |
55
+ | WER (without punctuation) | common_voice_15_0_test | 0.3352 | **0.2135** |
56
+ | WER | common_voice_15_0_test | 0.4050 | **0.2652** |
57
 
58
+ ## Limitations
59
+ Because texts in Common Voice don't contain digits and other characters except letters and punctuation signs, model lost an ability to predict numbers and special characters.
60
 
61
  ## Size
62
  Only 10% tokens was left including special whisper tokens (no language tokens except \<|ru|\> and \<|en|\>, no timestamp tokens), 200 most popular tokens from tokenizer and 4000 most popular Russian tokens computed by tokenization of russian text corpus.