Update README.md
Browse files
README.md
CHANGED
@@ -18,14 +18,14 @@ In this release the entire encoder was frozen. Subsequent releases will not do t
|
|
18 |
generalization to other types of data (i.e not parliamentary speeches) is kept when not freezing
|
19 |
the encoder.
|
20 |
|
21 |
-
## Evaluation
|
22 |
|
23 |
-
* RixVox
|
24 |
-
* RixVox
|
25 |
-
* Common Voice
|
26 |
-
* Common Voice
|
27 |
-
* Fleurs WER:
|
28 |
-
* Fleurs WER
|
29 |
|
30 |
*) Normalization is done by applying the following to source and generated texts:
|
31 |
|
@@ -34,7 +34,7 @@ def normalize(s):
|
|
34 |
return ' '.join([ x for x in sub('[^0-9a-zåäöA-ZÅÄÖ ]', ' ', s.lower().replace('é', 'e')).split() ])
|
35 |
```
|
36 |
|
37 |
-
In comparison the original Whisper large gets `30.
|
38 |
|
39 |
## Training
|
40 |
|
|
|
18 |
generalization to other types of data (i.e not parliamentary speeches) is kept when not freezing
|
19 |
the encoder.
|
20 |
|
21 |
+
## Evaluation (test)
|
22 |
|
23 |
+
* RixVox WER: `22.59`
|
24 |
+
* RixVox WER (normalized*): `19.33`
|
25 |
+
* Common Voice 11 WER: `18.03`
|
26 |
+
* Common Voice 11 WER (normalized*): `13.23`
|
27 |
+
* Fleurs WER: `14.26`
|
28 |
+
* Fleurs WER (normalized*): `8.99`
|
29 |
|
30 |
*) Normalization is done by applying the following to source and generated texts:
|
31 |
|
|
|
34 |
return ' '.join([ x for x in sub('[^0-9a-zåäöA-ZÅÄÖ ]', ' ', s.lower().replace('é', 'e')).split() ])
|
35 |
```
|
36 |
|
37 |
+
In comparison the original Whisper large gets `30.56`/`25.58`, `18.76`/`15.00`, and `14.53`/`9.19` respectively.
|
38 |
|
39 |
## Training
|
40 |
|