tensorops commited on
Commit
d581d63
·
verified ·
1 Parent(s): e966de7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -3
README.md CHANGED
@@ -1,3 +1,69 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ ## Distilled Large-V3 Whisper ASR Model for Thai
6
+
7
+ ### Model Description
8
+ This is a fine-tuned distilled Automatic Speech Recognition (ASR) model, based on the Whisper Large Turbo V3 architecture. It has been specifically tailored for Thai language speech recognition and substantially improve the performance on Thai speech.
9
+
10
+ #### Fine-tuning Details
11
+ - **Original Model**: Whisper Large V3 Turbo
12
+ - **Datasets Used for Fine-tuning**:
13
+ - Common Voice v13
14
+ - Gowajee
15
+ - Thai Elderly Speech Corpus
16
+ - Custom Scraped Data
17
+ - Thai-Central Dialect from [SLSCU Thai Dialect Corpus](https://github.com/SLSCU/thai-dialect-corpus)
18
+
19
+ ### Model Performance
20
+ - **DeepCut Tokenized WER on Common Voice 13 Test Set**:
21
+ - Original Model: **41.53%**
22
+ - This Model: **6.82%**
23
+ - **DeepCut Tokenized WER on FLEURS Test Set**:
24
+ - Original Model: **25.56%**
25
+ - This Model: **10.65%**
26
+
27
+ ### Intended Use
28
+ This model is intended for use in applications requiring Thai language speech recognition.
29
+
30
+ ### Limitations
31
+ - The model is specifically trained for the Thai language and may not perform well with other languages.
32
+ - Performance might vary across different Thai dialects and accents.
33
+ - As with any ASR system, background noise and speech clarity can impact recognition accuracy.
34
+
35
+ ### Acknowledgments
36
+ This model was developed using resources and datasets provided by the speech and language technology community. Special thanks to the teams behind Common Voice, Gowajee, SLSCU, and the Thai Elderly Speech Corpus for their valuable datasets.
37
+
38
+ ### Framework versions
39
+
40
+ - Transformers 4.35.2
41
+ - Pytorch 2.1.2
42
+ - Datasets 2.16.1
43
+ - Tokenizers 0.15.0
44
+
45
+ ### Citation
46
+
47
+ Cite using Bibtex:
48
+
49
+ ```
50
+ @inproceedings{aung-etal-2024-thonburian,
51
+ title = "Thonburian Whisper: Robust Fine-tuned and Distilled Whisper for {T}hai",
52
+ author = "Aung, Zaw Htet and
53
+ Thavornmongkol, Thanachot and
54
+ Boribalburephan, Atirut and
55
+ Tangsriworakan, Vittavas and
56
+ Pipatsrisawat, Knot and
57
+ Achakulvisut, Titipat",
58
+ editor = "Abbas, Mourad and
59
+ Freihat, Abed Alhakim",
60
+ booktitle = "Proceedings of the 7th International Conference on Natural Language and Speech Processing (ICNLSP 2024)",
61
+ month = oct,
62
+ year = "2024",
63
+ address = "Trento",
64
+ publisher = "Association for Computational Linguistics",
65
+ url = "https://aclanthology.org/2024.icnlsp-1.17",
66
+ pages = "149--156",
67
+ }
68
+ ```
69
+ ---