Spaces:

somosnlp-hackathon-2022
/

Audio-Sentiment-Classifier

Running

DrishtiSharma commited on Apr 4, 2022

Commit

fdb710c

•

1 Parent(s): 32e715d

Update info.txt

Files changed (1) hide show

info.txt CHANGED Viewed

@@ -8,5 +8,5 @@ The open-source MESD dataset was used to fine-tune the Wav2Vec2 base model, whic
 The aforementioned prototype may not function well in noisy environments or audio with a musical/noisy background due to the models being trained on too little data due to sparse availability. In order to make our model robust, our future work includes:
   --- Accumulation  of more audio data which closely resembles the aural environment in which the app will be used/tested.
   --- The F1 score for the "Fear" class is 47.5%. We aim to do targeted improvement for the class “Fear” which the model often misclassifies.
-  --- We tried to finetune Wav2Vec2-xls-r Spanish checkpoints on MESD dataset, ran several tests on different wav2vec2-xls-r Spanish checkpoints and were surprised to find that they performed worse than the Wav2Vec2-base model fine-tuned on MESD. To recheck and establish whether the prosodies critical for audio sentiment classification tasks are lost during the finetuning of the model for ASR purposes, an in-depth study and root cause analysis need to be done.
 1. Drishti Sharma    2. Manuel Fernandez Moya    3. Antonio Alberto Soto Hernández    4. Jefferson Quispe Pinares    5. Matias Gaona

 The aforementioned prototype may not function well in noisy environments or audio with a musical/noisy background due to the models being trained on too little data due to sparse availability. In order to make our model robust, our future work includes:
   --- Accumulation  of more audio data which closely resembles the aural environment in which the app will be used/tested.
   --- The F1 score for the "Fear" class is 47.5%. We aim to do targeted improvement for the class “Fear” which the model often misclassifies.
+  --- We tried to finetune Wav2Vec2-XLS-R Spanish checkpoints on MESD dataset, ran several tests on different Wav2Vec2-XLS-R Spanish checkpoints and were surprised to find that they performed worse than the Wav2Vec2-base model fine-tuned on MESD. To recheck and establish whether the prosodies critical for audio sentiment classification tasks are lost during the finetuning of the model for ASR purposes, an in-depth study and root cause analysis need to be done.
 1. Drishti Sharma    2. Manuel Fernandez Moya    3. Antonio Alberto Soto Hernández    4. Jefferson Quispe Pinares    5. Matias Gaona