SSL4PR WavLM Base and HuBERT Base Models

This repository hosts the pre-trained SSL4PR models for Parkinson's Disease detection from speech in real-world operating conditions. These models are based on the work titled "Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions" by La Quatra et al. published at Interspeech 2024. Paper Link

Repository Link

GitHub Repository please refer to the repository for all details on the models, training and usage.

Pre-trained Models

Pre-trained models are available on the Hugging Face model hub. To use the SSL4PR models, please clone the desired repository by running one of the following commands:

# For fold-based models (10-fold cross-validation)
git clone https://huggingface.co./morenolq/SSL4PR-wavlm-base
git clone https://huggingface.co./morenolq/SSL4PR-hubert-base

# For full training models (trained on complete s-PC-GITA)
git clone https://huggingface.co./morenolq/SSL4PR-wavlm-base-full
git clone https://huggingface.co./morenolq/SSL4PR-hubert-base-full

Ensure you have git lfs installed.

Fold-based Models

The fold-based repositories contain models trained using 10-fold cross-validation on s-PC-GITA. Each repository contains 10 pre-trained models, one per fold, named fold_1.pt, fold_2.pt, ..., fold_10.pt.

SSL4PR WavLM Base: using as base model WavLM Base
SSL4PR HuBERT Base: using as base model HuBERT Base

Full Training Models

The full training repositories contain models trained on the complete s-PC-GITA dataset and tested on enhanced e-PC-GITA (as reported in Table 3 of the paper). Each repository contains a single model file named model.pt.

SSL4PR WavLM Base Full: using as base model WavLM Base - trained on complete s-PC-GITA and tested on enhanced e-PC-GITA
SSL4PR HuBERT Base Full: using as base model HuBERT Base - trained on complete s-PC-GITA and tested on enhanced e-PC-GITA

All models are available in PyTorch format. ⚠️ Please note that the models are not directly compatible with the Hugging Face Transformers library because they are trained using specific head components (i.e., attention pooling, layer weighting...) as you can find in the model class

An image of the model architecture below:

Citation

@inproceedings{laquatra24_interspeech,
  title     = {Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions},
  author    = {Moreno {La Quatra} and Maria Francesca Turco and Torbjørn Svendsen and Giampiero Salvi and Juan Rafael Orozco-Arroyave and Sabato Marco Siniscalchi},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {1405--1409},
  doi       = {10.21437/Interspeech.2024-522},
  issn      = {2958-1796},
}

morenolq
/

SSL4PR-wavlm-base-full