Whisper Small Basque

This model is a fine-tuned version of openai/whisper-small specifically for Basque (eu) language Automatic Speech Recognition (ASR). It was trained on the asierhv/composite_corpus_eu_v2.1 dataset, which is a composite corpus designed to improve Basque ASR performance.

Key improvements and results compared to the base model:

  • Significant WER reduction: The fine-tuned model achieves a Word Error Rate (WER) of 9.5479 on the validation set of the asierhv/composite_corpus_eu_v2.1 dataset, demonstrating improved accuracy compared to the base whisper-small model for Basque.
  • Performance on Common Voice: When evaluated on the Mozilla Common Voice 18.0 dataset, the model achieved a WER of 7.63. This demonstrates the model's ability to generalize to other Basque speech datasets, and highlights the improved accuracy due to the larger model size.

Model description

This model leverages the whisper-small architecture, which offers a balance between accuracy and computational efficiency. By fine-tuning it on a dedicated Basque speech corpus, the model specializes in accurately transcribing Basque speech. This model has a larger capacity than whisper-base, improving accuracy at the cost of increased computational resources.

Intended uses & limitations

Intended uses:

  • High-accuracy automatic transcription of Basque speech for professional applications.
  • Development of advanced Basque speech-based applications that require high precision.
  • Research in Basque speech processing where the highest possible accuracy is needed.
  • Professional transcription services and applications requiring very high accuracy.
  • Use in scenarios where a higher computational cost is justified by the significant improvement in accuracy.

Limitations:

  • Performance is still influenced by audio quality, with challenges arising from background noise and poor recording conditions.
  • Accuracy may be affected by highly dialectal or informal Basque speech.
  • Despite improved performance, the model may still produce errors, particularly with complex linguistic structures or rare words.
  • The small model is larger than both the base and tiny models, so inference will be slower and require more resources.

Training and evaluation data

  • Training dataset: asierhv/composite_corpus_eu_v2.1. This dataset is a comprehensive collection of Basque speech data, tailored to enhance the performance of Basque ASR systems.
  • Evaluation Dataset: The test split of asierhv/composite_corpus_eu_v2.1.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1.25e-05
  • train_batch_size: 32
  • eval_batch_size: 16
  • seed: 42
  • optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 10000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss WER
0.3863 0.1 1000 0.4090 21.2189
0.1897 0.2 2000 0.3457 15.4490
0.1379 0.3 3000 0.3283 13.5756
0.1825 0.4 4000 0.3024 12.3954
0.0775 0.5 5000 0.3198 11.8771
0.0975 0.6 6000 0.2924 11.2589
0.1132 0.7 7000 0.2969 10.8468
0.0852 0.8 8000 0.2237 9.7727
0.0585 0.9 9000 0.2317 9.6291
0.0654 1.0 10000 0.2353 9.5479

Framework versions

  • Transformers 4.49.0.dev0
  • Pytorch 2.6.0+cu124
  • Datasets 3.3.1.dev0
  • Tokenizers 0.21.0
Downloads last month
71
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for xezpeleta/whisper-small-eu

Finetuned
(2324)
this model
Finetunes
2 models

Dataset used to train xezpeleta/whisper-small-eu

Collection including xezpeleta/whisper-small-eu

Evaluation results