DrishtiSharma commited on
Commit
fdb710c
1 Parent(s): 32e715d

Update info.txt

Browse files
Files changed (1) hide show
  1. info.txt +1 -1
info.txt CHANGED
@@ -8,5 +8,5 @@ The open-source MESD dataset was used to fine-tune the Wav2Vec2 base model, whic
8
  The aforementioned prototype may not function well in noisy environments or audio with a musical/noisy background due to the models being trained on too little data due to sparse availability. In order to make our model robust, our future work includes:
9
  --- Accumulation of more audio data which closely resembles the aural environment in which the app will be used/tested.
10
  --- The F1 score for the "Fear" class is 47.5%. We aim to do targeted improvement for the class “Fear” which the model often misclassifies.
11
- --- We tried to finetune Wav2Vec2-xls-r Spanish checkpoints on MESD dataset, ran several tests on different wav2vec2-xls-r Spanish checkpoints and were surprised to find that they performed worse than the Wav2Vec2-base model fine-tuned on MESD. To recheck and establish whether the prosodies critical for audio sentiment classification tasks are lost during the finetuning of the model for ASR purposes, an in-depth study and root cause analysis need to be done.
12
  1. Drishti Sharma 2. Manuel Fernandez Moya 3. Antonio Alberto Soto Hernández 4. Jefferson Quispe Pinares 5. Matias Gaona
 
8
  The aforementioned prototype may not function well in noisy environments or audio with a musical/noisy background due to the models being trained on too little data due to sparse availability. In order to make our model robust, our future work includes:
9
  --- Accumulation of more audio data which closely resembles the aural environment in which the app will be used/tested.
10
  --- The F1 score for the "Fear" class is 47.5%. We aim to do targeted improvement for the class “Fear” which the model often misclassifies.
11
+ --- We tried to finetune Wav2Vec2-XLS-R Spanish checkpoints on MESD dataset, ran several tests on different Wav2Vec2-XLS-R Spanish checkpoints and were surprised to find that they performed worse than the Wav2Vec2-base model fine-tuned on MESD. To recheck and establish whether the prosodies critical for audio sentiment classification tasks are lost during the finetuning of the model for ASR purposes, an in-depth study and root cause analysis need to be done.
12
  1. Drishti Sharma 2. Manuel Fernandez Moya 3. Antonio Alberto Soto Hernández 4. Jefferson Quispe Pinares 5. Matias Gaona