Edit model card

whisper-large-v3-chichewa

This model is a fine-tuned version of openai/whisper-large-v3 on a Chichewa dataset. The dataset details will be provided at a later stage. It achieves the following results on the evaluation set:

  • Loss: 3.1427
  • Wer: 101.3340

Model description

More information needed

Intended uses & limitations

The model is fine-tuned to perfom transcription of Chichewa. There are several versions of this model, please refer to the usage example notebook to see how to find the commit which has model with best WER. Alternatively, you can explore the Files and Versions tab, go to commits and find the commit with best WER (around 61). Also, its worth noting that the model repo doesnt have a tokenizer.json, as a results the model needs to be loaded using WhisperProcessor instead of AutoModel or other modules in Transformer.

Source of Funding for this Work

The dataset used to fine-tune this model, as well as the compute resources, were provided by Opportunity International.
This was part of a project in Malawi aimed at supporting the deployment of an LLM-based chatbot for agriculture, with the capability to handle voice interactions in the local language, Chichewa. A total of 30 hours was collected for this dataset but due to data quality issues, only 25 hours was used. About 30 minutes was also removed to be used as hold-out for further model evaluation.

Training and evaluation data

More information needed

Training procedure

Most of the training for this model involved trying to varying speech dataset sizes (5 hours, 10 hours up to 24 hours). As such, the different model commits represent different data sizes. More details will be added to each model commit.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 32
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 10000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.9229 7.0423 1000 2.0780 86.3539
0.1427 14.0845 2000 2.5560 83.5493
0.087 21.1268 3000 2.6909 80.8704
0.0742 28.1690 4000 2.8007 81.8982
0.065 35.2113 5000 2.8871 84.3639
0.0627 42.2535 6000 2.9465 84.5334
0.0586 49.2958 7000 3.0451 114.1600
0.063 56.3380 8000 3.0983 82.6964
0.0588 63.3803 9000 3.1352 81.0180
0.0591 70.4225 10000 3.1427 101.3340

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.0.1
  • Datasets 3.0.1
  • Tokenizers 0.20.0
Downloads last month
133
Safetensors
Model size
1.54B params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for dmatekenya/whisper-large-v3-chichewa

Finetuned
(277)
this model

Space using dmatekenya/whisper-large-v3-chichewa 1