metadata

license: apache-2.0
base_model: dmusingu/WHISPER-SMALL-SWAHILI-ASR-CV-14
tags:
  - generated_from_trainer
datasets:
  - common_voice_11_0
metrics:
  - wer
model-index:
  - name: whisper-small-swahili
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice_11_0
          type: common_voice_11_0
          config: sw
          split: None
          args: sw
        metrics:
          - name: Wer
            type: wer
            value: 26.373626373626376

whisper-small-swahili

This model is a fine-tuned version of dmusingu/WHISPER-SMALL-SWAHILI-ASR-CV-14 on the common_voice_11_0 dataset. It achieves the following results on the evaluation set:

Loss: 1.9641
Model Preparation Time: 0.0073
Wer: 26.3736

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 5
training_steps: 100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Model Preparation Time	Wer
No log	1.4286	10	2.2013	0.0073	26.2515
No log	2.8571	20	2.1523	0.0073	26.3736
1.7887	4.2857	30	2.1129	0.0073	26.2515
1.7887	5.7143	40	2.0751	0.0073	26.2515
1.6873	7.1429	50	2.0428	0.0073	26.2515
1.6873	8.5714	60	2.0161	0.0073	26.3736
1.6873	10.0	70	1.9944	0.0073	26.3736
1.5626	11.4286	80	1.9788	0.0073	26.3736
1.5626	12.8571	90	1.9687	0.0073	26.3736
1.4991	14.2857	100	1.9641	0.0073	26.3736

Framework versions

Transformers 4.44.0
Pytorch 2.3.1+cu121
Datasets 2.21.0
Tokenizers 0.19.1