whisper-large-v3-chichewa

This model is a fine-tuned version of openai/whisper-large-v3 on a Chichewa dataset. The dataset details will be provided at a later stage. It achieves the following results on the evaluation set:

Loss: 3.1427
Wer: 101.3340

Model description

More information needed

Intended uses & limitations

The model is fine-tuned to perfom transcription of Chichewa. There are several versions of this model, please refer to the usage example notebook to see how to find the commit which has model with best WER. Alternatively, you can explore the Files and Versions tab, go to commits and find the commit with best WER (around 61). Also, its worth noting that the model repo doesnt have a tokenizer.json, as a results the model needs to be loaded using WhisperProcessor instead of AutoModel or other modules in Transformer.

Source of Funding for this Work

The dataset used to fine-tune this model, as well as the compute resources, were provided by Opportunity International.
This was part of a project in Malawi aimed at supporting the deployment of an LLM-based chatbot for agriculture, with the capability to handle voice interactions in the local language, Chichewa. A total of 30 hours was collected for this dataset but due to data quality issues, only 25 hours was used. About 30 minutes was also removed to be used as hold-out for further model evaluation.

Training and evaluation data

More information needed

Training procedure

Most of the training for this model involved trying to varying speech dataset sizes (5 hours, 10 hours up to 24 hours). As such, the different model commits represent different data sizes. More details will be added to each model commit.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 32
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 10000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.9229	7.0423	1000	2.0780	86.3539
0.1427	14.0845	2000	2.5560	83.5493
0.087	21.1268	3000	2.6909	80.8704
0.0742	28.1690	4000	2.8007	81.8982
0.065	35.2113	5000	2.8871	84.3639
0.0627	42.2535	6000	2.9465	84.5334
0.0586	49.2958	7000	3.0451	114.1600
0.063	56.3380	8000	3.0983	82.6964
0.0588	63.3803	9000	3.1352	81.0180
0.0591	70.4225	10000	3.1427	101.3340

Framework versions

Transformers 4.45.1
Pytorch 2.0.1
Datasets 3.0.1
Tokenizers 0.20.0

dmatekenya
/

whisper-large-v3-chichewa