Model Card for Model ID

We build a CTC-based phoneme recognition model using wav2vec 2.0 (W2V2) for children under 4-year-old. We use three-level fine-tuning to gradually reduce age mismatch between adult phonetics to child phonetics.

  • W2V2-Libri100h: We first fine-tune W2V2-Base using 100 hours of LibriSpeech pretrained on unlabeled 960 hours LibriSpeech adult speech corpus with IPA phone sequences.
  • W2V2-MyST: We then fine-tune W2V2-Libri100h using My Science Tutor corpus (consists of conversational speech of students between the third and fifth grades with a virtual tutor).
  • W2V2-Libri100h-Pro (two-level fine-tuning): We fine-tune W2V2-Libri100h using Providence corpus (consists of longititude audio of 6 English-speaking children aged from 1-4 years interacting with their mothers at home) on phoneme sequences.
  • W2V2-MyST-Pro (three-level fine-tuning): Similar as W2V2-Libri100h-Pro, we fine-tune W2V2-MyST using Providence on phoneme sequences.

We show W2V2-MyST-Pro is helpful for improving children's vocalization classification task on two corpus, including Rapid-ABC and BabbleCor.

Model Sources

For more information regarding this model, please checkout our paper:

Model Description

Folder contains the best checkpoint of the following setting

  • W2V2-Libri100h: save_100h/wav2vec2.ckpt
  • W2V2-MyST: save_100h_MyST/wav2vec2.ckpt
  • W2V2-Libri100h-Pro: save_100h_Providence/wav2vec2.ckpt
  • W2V2-MyST-Pro: save_100h_MyST_Providence/wav2vec2.ckpt

Uses

We develop our complete fine-tuning recipe using SpeechBrain toolkit available at https://github.com/jialuli3/wav2vec_LittleBeats_LENA

Paper/BibTex Citation

If you found this model helpful to you, please cite us as


@article{li2023enhancing,
  title={Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis},
  author={Li, Jialu and Hasegawa-Johnson, Mark and Karahalios, Karrie},
  booktitle={Interspeech},
  year={2024}
}

and/or


@inproceedings{li2024analysis,
  title={Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations},
  author={Li, Jialu and Hasegawa-Johnson, Mark and McElwain, Nancy L},
  booktitle={IEEE Workshop on Self-Supervision in Audio, Speech and Beyond (SASB)},
  year={2024}
}

Model Card Contact

Jialu Li, Ph.D. (she, her, hers)

E-mail: [email protected]

Homepage: https://sites.google.com/view/jialuli/

Downloads last month
0
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using lijialudew/wav2vec_children_ASR 1