Update README.md
Browse files
README.md
CHANGED
@@ -29,7 +29,7 @@ This model was trained to extend the performance of the original whisper model f
|
|
29 |
I downloaded all data from AI-HUB (https://aihub.or.kr/). Two datasets, in particular, caught my attention: "Instruction Audio Set" and "Noisy Conversation Audio Set".
|
30 |
Following indicates the hours information for each dastset.
|
31 |
|
32 |
-
|dataset name| train_split | validation_split|
|
33 |
|---|---|---|
|
34 |
|Instruction Audio Set|910|105|
|
35 |
|Noisy Conversation Audio Set|363|76|
|
@@ -58,9 +58,20 @@ The following hyperparameters were used during training:
|
|
58 |
| 0.0148 | 3.0 | 26325 | 0.1254 | 0.0551 |
|
59 |
|
60 |
|
|
|
61 |
### Framework versions
|
62 |
|
63 |
- Transformers 4.28.0.dev0
|
64 |
- Pytorch 1.13.1+cu117
|
65 |
- Datasets 2.11.0
|
66 |
- Tokenizers 0.13.2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
I downloaded all data from AI-HUB (https://aihub.or.kr/). Two datasets, in particular, caught my attention: "Instruction Audio Set" and "Noisy Conversation Audio Set".
|
30 |
Following indicates the hours information for each dastset.
|
31 |
|
32 |
+
|dataset name| train_split (hours) | validation_split (hours)|
|
33 |
|---|---|---|
|
34 |
|Instruction Audio Set|910|105|
|
35 |
|Noisy Conversation Audio Set|363|76|
|
|
|
58 |
| 0.0148 | 3.0 | 26325 | 0.1254 | 0.0551 |
|
59 |
|
60 |
|
61 |
+
|
62 |
### Framework versions
|
63 |
|
64 |
- Transformers 4.28.0.dev0
|
65 |
- Pytorch 1.13.1+cu117
|
66 |
- Datasets 2.11.0
|
67 |
- Tokenizers 0.13.2
|
68 |
+
|
69 |
+
## Evaluation Result for the dataset `google/fleurs`
|
70 |
+
|
71 |
+
The trained model is evaluated on the `test` split of subset `ko_kr` from the dataset `google/fleurs`.
|
72 |
+
Please note that the model was not trained on the `train` split from the dataset.
|
73 |
+
|
74 |
+
|model|Wer|
|
75 |
+
|---|---|
|
76 |
+
|openai/whisper|0.2469|
|
77 |
+
|this model|0.2189|
|