jangmin
/

whisper-medium-ko-normalized-1273h

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

jangmin commited on Jun 8, 2023

Commit

bb1931b

·

1 Parent(s): 040d602

Update README.md

Files changed (1) hide show

README.md +12 -1

README.md CHANGED Viewed

@@ -29,7 +29,7 @@ This model was trained to extend the performance of the original whisper model f
 I downloaded all data from AI-HUB (https://aihub.or.kr/). Two datasets, in particular, caught my attention: "Instruction Audio Set" and "Noisy Conversation Audio Set".
 Following indicates the hours information for each dastset.
-|dataset name| train_split | validation_split|
 |---|---|---|
 |Instruction Audio Set|910|105|
 |Noisy Conversation Audio Set|363|76|
@@ -58,9 +58,20 @@ The following hyperparameters were used during training:
 | 0.0148        | 3.0   | 26325 | 0.1254          | 0.0551 |
 ### Framework versions
 - Transformers 4.28.0.dev0
 - Pytorch 1.13.1+cu117
 - Datasets 2.11.0
 - Tokenizers 0.13.2

 I downloaded all data from AI-HUB (https://aihub.or.kr/). Two datasets, in particular, caught my attention: "Instruction Audio Set" and "Noisy Conversation Audio Set".
 Following indicates the hours information for each dastset.
+|dataset name| train_split (hours) | validation_split (hours)|
 |---|---|---|
 |Instruction Audio Set|910|105|
 |Noisy Conversation Audio Set|363|76|
 | 0.0148        | 3.0   | 26325 | 0.1254          | 0.0551 |
 ### Framework versions
 - Transformers 4.28.0.dev0
 - Pytorch 1.13.1+cu117
 - Datasets 2.11.0
 - Tokenizers 0.13.2
+## Evaluation Result for the dataset `google/fleurs`
+The trained model is evaluated on the `test` split of subset `ko_kr` from the dataset `google/fleurs`.
+Please note that the model was not trained on the `train` split from the dataset.
+|model|Wer|
+|---|---|
+|openai/whisper|0.2469|
+|this model|0.2189|