Edit model card

Whisper Medium GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-small on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2087
  • Bleu: 32.53
  • Chrf: 52.88
  • Wer: 62.8095

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Bleu Chrf Wer
2.5164 0.0328 100 2.0060 2.56 17.46 162.9896
2.656 0.0657 200 2.0232 8.49 26.0 99.5498
2.5156 0.0985 300 1.9253 7.55 25.1 141.2877
2.4722 0.1314 400 1.8289 12.52 30.49 90.4548
2.3376 0.1642 500 1.6839 17.39 33.23 81.1796
2.1733 0.1970 600 1.7342 9.62 32.48 137.9559
2.3382 0.2299 700 1.6570 12.54 34.43 112.2467
2.0041 0.2627 800 1.6048 17.55 36.73 85.1418
2.1142 0.2956 900 1.6256 17.58 35.74 82.7105
2.024 0.3284 1000 1.5861 14.4 37.22 86.7177
1.7556 0.3612 1100 1.5415 17.21 38.88 84.5115
1.6904 0.3941 1200 1.4902 19.6 38.84 85.3670
1.674 0.4269 1300 1.4748 20.33 41.3 88.3836
1.6899 0.4598 1400 1.4479 22.74 43.25 80.9995
1.5234 0.4926 1500 1.3763 20.13 42.08 80.6844
1.364 0.5255 1600 1.4164 23.12 41.78 72.9851
1.5267 0.5583 1700 1.3855 19.94 41.63 91.7605
1.4282 0.5911 1800 1.3729 23.96 44.84 74.6961
1.3611 0.6240 1900 1.3562 23.1 45.41 81.8100
1.1396 0.6568 2000 1.3131 27.9 46.89 67.2670
1.1849 0.6897 2100 1.3483 24.38 45.25 75.8667
1.0871 0.7225 2200 1.2848 28.64 48.93 66.6817
1.1822 0.7553 2300 1.2782 28.41 47.25 68.6628
1.1272 0.7882 2400 1.2549 27.24 48.57 75.9568
1.0241 0.8210 2500 1.2922 25.74 47.44 74.4710
0.9629 0.8539 2600 1.3209 23.93 44.61 82.1252
0.8251 0.8867 2700 1.2273 32.21 51.64 65.5110
0.7921 0.9195 2800 1.2881 26.38 48.31 80.2792
0.8873 0.9524 2900 1.2268 26.57 50.09 77.1724
0.7967 0.9852 3000 1.2036 29.35 51.53 69.6533
0.3119 1.0181 3100 1.2231 31.77 51.57 62.3143
0.3009 1.0509 3200 1.2446 31.8 50.44 61.8190
0.2855 1.0837 3300 1.2240 30.48 50.86 66.7717
0.2535 1.1166 3400 1.2287 31.96 52.82 63.3949
0.2162 1.1494 3500 1.2398 33.91 52.17 61.3688
0.2307 1.1823 3600 1.2280 32.11 51.67 64.7456
0.2184 1.2151 3700 1.2149 34.59 53.32 59.9730
0.2365 1.2479 3800 1.2044 32.51 52.98 62.3593
0.1958 1.2808 3900 1.2116 32.45 52.86 63.1697
0.2081 1.3136 4000 1.2087 32.53 52.88 62.8095

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
8
Safetensors
Model size
764M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ymoslem/whisper-medium-ga2en-v3.3-r

Finetuned
(1908)
this model

Datasets used to train ymoslem/whisper-medium-ga2en-v3.3-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    32.530
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    62.810