DrishtiSharma commited on
Commit
2078aee
1 Parent(s): 1cd6729

Update info.txt

Browse files
Files changed (1) hide show
  1. info.txt +3 -3
info.txt CHANGED
@@ -6,7 +6,7 @@ Potential Applications: Although this is a very small prototype, it can be scale
6
  To begin with, we didn't have enough audio data in Spanish suitable for sentiment classification tasks. We had to make do with whatever data we could find in the MESD database because much of the material we came across was not open-source. Furthermore, in the MESD dataset, augmented versions of audios pre-existed, accounting for up to 25% of the total data, thus we decided not to undertake any more data augmentation to avoid overwhelming the original audio samples.
7
  The open-source MESD dataset was used to fine-tune the Wav2Vec2 base model, which contains ~1200 audio recordings, all of which were recorded in professional studios and were only one second long. Out of ~1200 audio recordings only 890 of the recordings were utilized for training. Due to these factors, the model and hence this Gradio application may not be able to perform well in noisy environments or audio with background music or noise. It's also worth mentioning that this model performs poorly when it comes to audio recordings from the class "Fear," which the model often misclassifies.
8
  The aforementioned prototype may not function well in noisy environments or audio with a musical/noisy background due to the models being trained on too little data due to sparse availability. In order to make our model robust, our future work includes:
9
- --- Accumulation of more audio data which closely resembles the aural environment in which the app will be used/tested.
10
- --- The F1 score for the "Fear" class is 47.5%. We aim to do targeted improvement for the class “Fear” which the model often misclassifies.
11
- --- We tried to finetune wav2vec2-xls-r Spanish checkpoints on MESD dataset, ran several tests on different wav2vec2-xls-r Spanish checkpoints and were surprised to find that they performed worse than the wav2vec2-base model fine-tuned on MESD. To recheck and establish whether the prosodies critical for audio sentiment classification tasks are lost during the finetuning of the model for ASR purposes, an in-depth study and root cause analysis need to be done.
12
  1. Drishti Sharma 2. Manuel Fernandez Moya 3. Antonio Alberto Soto Hernández 4. Jefferson Quispe Pinares 5. Matias Gaona
 
6
  To begin with, we didn't have enough audio data in Spanish suitable for sentiment classification tasks. We had to make do with whatever data we could find in the MESD database because much of the material we came across was not open-source. Furthermore, in the MESD dataset, augmented versions of audios pre-existed, accounting for up to 25% of the total data, thus we decided not to undertake any more data augmentation to avoid overwhelming the original audio samples.
7
  The open-source MESD dataset was used to fine-tune the Wav2Vec2 base model, which contains ~1200 audio recordings, all of which were recorded in professional studios and were only one second long. Out of ~1200 audio recordings only 890 of the recordings were utilized for training. Due to these factors, the model and hence this Gradio application may not be able to perform well in noisy environments or audio with background music or noise. It's also worth mentioning that this model performs poorly when it comes to audio recordings from the class "Fear," which the model often misclassifies.
8
  The aforementioned prototype may not function well in noisy environments or audio with a musical/noisy background due to the models being trained on too little data due to sparse availability. In order to make our model robust, our future work includes:
9
+ --- Accumulation of more audio data which closely resembles the aural environment in which the app will be used/tested.
10
+ --- The F1 score for the "Fear" class is 47.5%. We aim to do targeted improvement for the class “Fear” which the model often misclassifies.
11
+ --- We tried to finetune wav2vec2-xls-r Spanish checkpoints on MESD dataset, ran several tests on different wav2vec2-xls-r Spanish checkpoints and were surprised to find that they performed worse than the wav2vec2-base model fine-tuned on MESD. To recheck and establish whether the prosodies critical for audio sentiment classification tasks are lost during the finetuning of the model for ASR purposes, an in-depth study and root cause analysis need to be done.
12
  1. Drishti Sharma 2. Manuel Fernandez Moya 3. Antonio Alberto Soto Hernández 4. Jefferson Quispe Pinares 5. Matias Gaona