---
license: cc-by-sa-4.0
language:
- en
- zh
metrics:
- f1
library_name: transformers
pipeline_tag: audio-classification
tags:
- speech-emotion-recognition
---

# Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition

Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co./facebook/wav2vec2-large-xlsr-53) on English and Chinese data from elderly speakers.
The model is trained on the training sets of [CREMA-D](https://github.com/CheyneyComputerScience/CREMA-D), [CSED](https://github.com/AkishinoShiame/Chinese-Speech-Emotion-Datasets), [ElderReact](https://github.com/Mayer123/ElderReact), and [TESS](https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess).
When using this model, make sure that your speech input is sampled at 16kHz.

The scripts used for training and evaluation can be found here:
[https://github.com/HLTCHKUST/elderly_ser/tree/main](https://github.com/HLTCHKUST/elderly_ser/tree/main)

## Evaluation Results

For the details (e.g., the statistics of `train`, `valid`, and `test` data), please refer to our paper on [arXiv](https://arxiv.org/abs/2306.14517).
It also provides the model's speech emotion recognition performances on: English-All, Chinese-All, English-Elderly, Chinese-Elderly, English-Adults, Chinese-Adults.

## Citation

Our paper will be published at INTERSPEECH 2023. In the meantime, you can find our paper on [arXiv](https://arxiv.org/abs/2306.14517).
If you find our work useful, please consider citing our paper as follows:
```
@misc{cahyawijaya2023crosslingual,
      title={Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition}, 
      author={Samuel Cahyawijaya and Holy Lovenia and Willy Chung and Rita Frieske and Zihan Liu and Pascale Fung},
      year={2023},
      eprint={2306.14517},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```