Kalemat-Tech Arabic Speech Recognition Model (STT) - Mohamed Salama
ูู ูุฐุฌ ููู ุงุชู ููุชุนุฑู ุนูู ุงูุฃุตูุงุช ุงูุนุฑุจูุฉ ุงููุตุญู ู ุชุญููููุง ุฅูู ูุตูุต
KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small
This model is a fine-tuned version of openai/whisper-small on Common_Voice_Arabic_12.0_Augmented. It achieves the following results on the evaluation set:
- Loss: 0.5362
- Wer: 58.5848
Example of usage:
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
processor = AutoProcessor.from_pretrained("Salama1429/KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small")
model = AutoModelForSpeechSeq2Seq.from_pretrained("Salama1429/KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small")
Intended uses & limitations
Automatic Speech Recognition
Training and evaluation data
Common_Voice_Arabic_12.0 and I made some augmentations to it as follows:
- 25% of the data TimeMasking
- 25% of the data SpecAugmentation
- 25% of the data WavAugmentation (AddGaussianNoise)
- The final dataset is the original common voice plus the augmented files
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 64
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 25
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
0.2728 | 1.01 | 1000 | 0.3063 | 60.4733 |
0.1442 | 2.01 | 2000 | 0.2878 | 55.6935 |
0.0648 | 3.02 | 3000 | 0.3009 | 59.2568 |
0.0318 | 4.03 | 4000 | 0.3278 | 59.2993 |
0.0148 | 5.04 | 5000 | 0.3539 | 61.0364 |
0.0088 | 6.04 | 6000 | 0.3714 | 56.9154 |
0.0061 | 7.05 | 7000 | 0.3920 | 57.5515 |
0.0041 | 8.06 | 8000 | 0.4149 | 61.6328 |
0.0033 | 9.06 | 9000 | 0.4217 | 58.0310 |
0.0033 | 10.07 | 10000 | 0.4376 | 59.9594 |
0.0021 | 11.08 | 11000 | 0.4485 | 56.7812 |
0.0015 | 12.08 | 12000 | 0.4577 | 57.6936 |
0.0013 | 13.09 | 13000 | 0.4671 | 60.6606 |
0.0011 | 14.1 | 14000 | 0.4686 | 59.8159 |
0.0008 | 15.11 | 15000 | 0.4856 | 60.7111 |
0.0011 | 16.11 | 16000 | 0.4851 | 59.5198 |
0.0005 | 17.12 | 17000 | 0.4936 | 59.2608 |
0.0004 | 18.13 | 18000 | 0.4995 | 57.9619 |
0.0003 | 19.13 | 19000 | 0.5085 | 58.3630 |
0.0002 | 20.14 | 20000 | 0.5155 | 58.0987 |
0.0001 | 21.15 | 21000 | 0.5251 | 58.8504 |
0.0001 | 22.16 | 22000 | 0.5268 | 58.4228 |
0.0001 | 23.16 | 23000 | 0.5317 | 59.0881 |
0.0001 | 24.17 | 24000 | 0.5362 | 58.5848 |
Framework versions
- Transformers 4.25.1
- Pytorch 1.13.1+cu117
- Datasets 2.8.0
- Tokenizers 0.13.2
- Downloads last month
- 207
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Salama1429/KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small
Dataset used to train Salama1429/KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small
Evaluation results
- wer on mozilla-foundation/common_voice_12_0test set self-reported58.585