library_name: transformers
license: apache-2.0
base_model: openai/whisper-tiny
tags:
- whisper-event
- generated_from_trainer
datasets:
- asierhv/composite_corpus_eu_v2.1
metrics:
- wer
model-index:
- name: Whisper Tiny Basque
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Mozilla Common Voice 18.0
type: mozilla-foundation/common_voice_18_0
metrics:
- name: Wer
type: wer
value: 13.56
language:
- eu
Whisper Tiny Basque
This model is a fine-tuned version of openai/whisper-tiny specifically for Basque (eu) language Automatic Speech Recognition (ASR). It was trained on the asierhv/composite_corpus_eu_v2.1 dataset, which is a composite corpus designed to improve Basque ASR performance.
Key improvements and results compared to the base model:
- Significant WER reduction: The fine-tuned model achieves a Word Error Rate (WER) of 14.8495 on the validation set of the
asierhv/composite_corpus_eu_v2.1
dataset, demonstrating improved accuracy compared to the basewhisper-tiny
model for Basque. - Performance on Common Voice: When evaluated on the Mozilla Common Voice 18.0 dataset, the model achieved a WER of 13.56. This demonstrates the model's ability to generalize to other Basque speech datasets.
Model description
This model leverages the power of the Whisper architecture, originally developed by OpenAI, and adapts it to the specific nuances of the Basque language. By fine-tuning the whisper-tiny
model on a comprehensive Basque speech corpus, it learns to accurately transcribe spoken Basque. The whisper-tiny
model is the smallest of the whisper models, providing a good balance between speed and accuracy.
Intended uses & limitations
Intended uses:
- Automatic transcription of Basque speech.
- Development of Basque speech-based applications.
- Research on Basque speech processing.
- Accessibility tools for Basque speakers.
Limitations:
- Performance may vary depending on the quality of the audio input (e.g., background noise, recording quality).
- The model might struggle with highly dialectal or informal speech.
- While the model shows improved performance, it may still produce errors, especially with complex sentences or uncommon words.
- The model is based on the small version of whisper, and thus, accuracy may be improved with larger models.
Training and evaluation data
- Training dataset: asierhv/composite_corpus_eu_v2.1. This dataset is a composite corpus of Basque speech data, designed to improve the performance of Basque ASR systems.
- Evaluation Dataset: The
test
portion ofasierhv/composite_corpus_eu_v2.1
.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3.75e-05
- train_batch_size: 32
- eval_batch_size: 16
- seed: 42
- optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- training_steps: 10000
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | WER |
---|---|---|---|---|
0.586 | 0.1 | 1000 | 0.6249 | 34.1639 |
0.3145 | 0.2 | 2000 | 0.5048 | 25.2591 |
0.225 | 0.3 | 3000 | 0.4839 | 22.0557 |
0.3003 | 0.4 | 4000 | 0.4540 | 20.3072 |
0.132 | 0.5 | 5000 | 0.4574 | 19.0146 |
0.1588 | 0.6 | 6000 | 0.4380 | 17.8219 |
0.1841 | 0.7 | 7000 | 0.4395 | 16.6667 |
0.143 | 0.8 | 8000 | 0.3719 | 15.4490 |
0.0967 | 0.9 | 9000 | 0.3685 | 15.1368 |
0.1059 | 1.0 | 10000 | 0.3719 | 14.8495 |
Framework versions
- Transformers 4.49.0.dev0
- Pytorch 2.6.0+cu124
- Datasets 3.3.1.dev0
- Tokenizers 0.21.0