--- library_name: transformers license: apache-2.0 base_model: openai/whisper-large-v3 tags: - generated_from_trainer metrics: - accuracy - precision - recall - f1 model-index: - name: speech-emotion-recognition-with-openai-whisper-large-v3 results: [] --- # speech-emotion-recognition-with-openai-whisper-large-v3 This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co./openai/whisper-large-v3) on the [RAVDESS](https://zenodo.org/records/1188976#.XsAXemgzaUk), [SAVEE](https://www.kaggle.com/datasets/ejlok1/surrey-audiovisual-expressed-emotion-savee/data), [TESS](https://tspace.library.utoronto.ca/handle/1807/24487), and [URDU](https://www.kaggle.com/datasets/bitlord/urdu-language-speech-dataset) dataset. It achieves the following results on the evaluation set: - Loss: 0.5008 - Accuracy: 0.9199 - Precision: 0.9230 - Recall: 0.9199 - F1: 0.9198 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 2 - eval_batch_size: 2 - seed: 42 - gradient_accumulation_steps: 5 - total_train_batch_size: 10 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 25 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 | |:-------------:|:-------:|:----:|:---------------:|:--------:|:---------:|:------:|:------:| | 0.4948 | 0.9995 | 394 | 0.4911 | 0.8286 | 0.8449 | 0.8286 | 0.8302 | | 0.6271 | 1.9990 | 788 | 0.5307 | 0.8225 | 0.8559 | 0.8225 | 0.8277 | | 0.2364 | 2.9985 | 1182 | 0.5076 | 0.8692 | 0.8727 | 0.8692 | 0.8684 | | 0.0156 | 3.9980 | 1576 | 0.5669 | 0.8732 | 0.8868 | 0.8732 | 0.8745 | | 0.2305 | 5.0 | 1971 | 0.4578 | 0.9108 | 0.9142 | 0.9108 | 0.9114 | | 0.0112 | 5.9995 | 2365 | 0.4701 | 0.9108 | 0.9159 | 0.9108 | 0.9114 | | 0.0013 | 6.9990 | 2759 | 0.5232 | 0.9138 | 0.9204 | 0.9138 | 0.9137 | | 0.1894 | 7.9985 | 3153 | 0.5008 | 0.9199 | 0.9230 | 0.9199 | 0.9198 | | 0.0877 | 8.9980 | 3547 | 0.5517 | 0.9138 | 0.9152 | 0.9138 | 0.9138 | | 0.1471 | 10.0 | 3942 | 0.5856 | 0.8895 | 0.9002 | 0.8895 | 0.8915 | | 0.0026 | 10.9995 | 4336 | 0.8334 | 0.8773 | 0.8949 | 0.8773 | 0.8770 | ### Framework versions - Transformers 4.44.2 - Pytorch 2.4.1+cu121 - Datasets 3.0.0 - Tokenizers 0.19.1