whisper-large-v3-turbo-hr-parla

This model is a fine-tuned version of openai/whisper-large-v3 on the classla/ParlaSpeech-HR dataset and additional 400h private dataset, both with augmented additions.

It achieves the following results on the evaluation set:

Loss: 0.0816
Wer: 3.52%

WER comparsion

Dataset	Model	WER (%)
google/fleurs hr_hr test	openai/whisper-base	65.99
google/fleurs hr_hr test	openai/whisper-large-v3-turbo	12.73
google/fleurs hr_hr test	slsolucije/whisper-large-v3-turbo-hr-parla-lora-merged	9.93
google/fleurs hr_hr test	GoranS/whisper-large-v3-turbo-hr-parla	8.66
GoranS/stt-croatian_99k_265_2 test	openai/whisper-large-v3-turbo	22.93
GoranS/stt-croatian_99k_265_2 test	slsolucije/whisper-large-v3-turbo-hr-parla-lora-merged	19.02
GoranS/stt-croatian_99k_265_2 test	GoranS/whisper-large-v3-turbo-hr-parla	18.44
GoranS/stt-croatian-sl-31k test	openai/whisper-large-v3-turbo	21.62
GoranS/stt-croatian-sl-31k test	slsolucije/whisper-large-v3-turbo-hr-parla-lora-merged	17.07
GoranS/stt-croatian-sl-31k test	GoranS/whisper-large-v3-turbo-hr-parla	16.97
parla_867k_2483_0.5 test	openai/whisper-large-v3-turbo	10.23
parla_867k_2483_0.5 test	slsolucije/whisper-large-v3-turbo-hr-parla-lora-merged	4.58
parla_867k_2483_0.5 test	GoranS/whisper-large-v3-turbo-hr-parla	3.52

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 6.25e-06
train_batch_size: 64
eval_batch_size: 32
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 800
num_epochs: 2
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.1485	0.0703	1000	0.1376	0.0610
0.1399	0.1406	2000	0.1210	0.0545
0.1311	0.2108	3000	0.1144	0.0529
0.119	0.2811	4000	0.1058	0.0487
0.1165	0.3514	5000	0.1067	0.0517
0.1142	0.4217	6000	0.1007	0.0464
0.1095	0.4920	7000	0.1019	0.0447
0.1112	0.5622	8000	0.0974	0.0425
0.1104	0.6325	9000	0.0971	0.0442
0.1081	0.7028	10000	0.0943	0.0411
0.1025	0.7731	11000	0.0905	0.0397
0.1042	0.8433	12000	0.0930	0.0419
0.1031	0.9136	13000	0.0923	0.0428
0.1038	0.9839	14000	0.0894	0.0408
0.0878	1.0542	15000	0.0902	0.0408
0.0886	1.1245	16000	0.0869	0.0369
0.0864	1.1947	17000	0.0861	0.0364
0.0817	1.2650	18000	0.0867	0.0408
0.0899	1.3353	19000	0.0852	0.0383
0.0868	1.4056	20000	0.0846	0.0369
0.0858	1.4759	21000	0.0844	0.0378
0.0827	1.5461	22000	0.0845	0.0391
0.0798	1.6164	23000	0.0846	0.0378
0.0845	1.6867	24000	0.0833	0.0375
0.0768	1.7570	25000	0.0840	0.0375
0.0799	1.8273	26000	0.0837	0.0375
0.0808	1.8975	27000	0.0825	0.0352
0.0837	1.9678	28000	0.0816	0.0352

Framework versions

Transformers 4.46.3
Pytorch 2.5.0+cu121
Datasets 3.1.0
Tokenizers 0.20.3

GoranS
/

whisper-large-v3-turbo-hr-parla

whisper-large-v3-turbo-hr-parla

WER comparsion

Training hyperparameters

Training results

Framework versions

Model tree for GoranS/whisper-large-v3-turbo-hr-parla

Evaluation results