whisper-tiny-eu / README.md
xezpeleta's picture
Update README.md
b86bfef verified
metadata
library_name: transformers
license: apache-2.0
base_model: openai/whisper-tiny
tags:
  - whisper-event
  - generated_from_trainer
datasets:
  - asierhv/composite_corpus_eu_v2.1
metrics:
  - wer
model-index:
  - name: Whisper Tiny Basque
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Mozilla Common Voice 18.0
          type: mozilla-foundation/common_voice_18_0
        metrics:
          - name: Wer
            type: wer
            value: 13.56
language:
  - eu

Whisper Tiny Basque

This model is a fine-tuned version of openai/whisper-tiny specifically for Basque (eu) language Automatic Speech Recognition (ASR). It was trained on the asierhv/composite_corpus_eu_v2.1 dataset, which is a composite corpus designed to improve Basque ASR performance.

Key improvements and results compared to the base model:

  • Significant WER reduction: The fine-tuned model achieves a Word Error Rate (WER) of 14.8495 on the validation set of the asierhv/composite_corpus_eu_v2.1 dataset, demonstrating improved accuracy compared to the base whisper-tiny model for Basque.
  • Performance on Common Voice: When evaluated on the Mozilla Common Voice 18.0 dataset, the model achieved a WER of 13.56. This demonstrates the model's ability to generalize to other Basque speech datasets.

Model description

This model leverages the power of the Whisper architecture, originally developed by OpenAI, and adapts it to the specific nuances of the Basque language. By fine-tuning the whisper-tiny model on a comprehensive Basque speech corpus, it learns to accurately transcribe spoken Basque. The whisper-tiny model is the smallest of the whisper models, providing a good balance between speed and accuracy.

Intended uses & limitations

Intended uses:

  • Automatic transcription of Basque speech.
  • Development of Basque speech-based applications.
  • Research on Basque speech processing.
  • Accessibility tools for Basque speakers.

Limitations:

  • Performance may vary depending on the quality of the audio input (e.g., background noise, recording quality).
  • The model might struggle with highly dialectal or informal speech.
  • While the model shows improved performance, it may still produce errors, especially with complex sentences or uncommon words.
  • The model is based on the small version of whisper, and thus, accuracy may be improved with larger models.

Training and evaluation data

  • Training dataset: asierhv/composite_corpus_eu_v2.1. This dataset is a composite corpus of Basque speech data, designed to improve the performance of Basque ASR systems.
  • Evaluation Dataset: The test portion of asierhv/composite_corpus_eu_v2.1.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3.75e-05
  • train_batch_size: 32
  • eval_batch_size: 16
  • seed: 42
  • optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • training_steps: 10000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss WER
0.586 0.1 1000 0.6249 34.1639
0.3145 0.2 2000 0.5048 25.2591
0.225 0.3 3000 0.4839 22.0557
0.3003 0.4 4000 0.4540 20.3072
0.132 0.5 5000 0.4574 19.0146
0.1588 0.6 6000 0.4380 17.8219
0.1841 0.7 7000 0.4395 16.6667
0.143 0.8 8000 0.3719 15.4490
0.0967 0.9 9000 0.3685 15.1368
0.1059 1.0 10000 0.3719 14.8495

Framework versions

  • Transformers 4.49.0.dev0
  • Pytorch 2.6.0+cu124
  • Datasets 3.3.1.dev0
  • Tokenizers 0.21.0