SSL4PR WavLM Base and HuBERT Base Models
This repository hosts the pre-trained SSL4PR models for Parkinson's Disease detection from speech in real-world operating conditions. These models are based on the work titled "Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions" by La Quatra et al. published at Interspeech 2024. Paper Link
Repository Link
GitHub Repository please refer to the repository for all details on the models, training and usage.
Pre-trained Models
Pre-trained models are available on the Hugging Face model hub. To use the SSL4PR models, please clone the desired repository by running one of the following commands:
# For fold-based models (10-fold cross-validation)
git clone https://huggingface.co./morenolq/SSL4PR-wavlm-base
git clone https://huggingface.co./morenolq/SSL4PR-hubert-base
# For full training models (trained on complete s-PC-GITA)
git clone https://huggingface.co./morenolq/SSL4PR-wavlm-base-full
git clone https://huggingface.co./morenolq/SSL4PR-hubert-base-full
Ensure you have git lfs installed.
Fold-based Models
The fold-based repositories contain models trained using 10-fold cross-validation on s-PC-GITA.
Each repository contains 10 pre-trained models, one per fold, named fold_1.pt
, fold_2.pt
, ..., fold_10.pt
.
- SSL4PR WavLM Base: using as base model WavLM Base
- SSL4PR HuBERT Base: using as base model HuBERT Base
Full Training Models
The full training repositories contain models trained on the complete s-PC-GITA dataset and tested on
enhanced e-PC-GITA (as reported in Table 3 of the paper). Each repository contains a single model file
named model.pt
.
- SSL4PR WavLM Base Full: using as base model WavLM Base - trained on complete s-PC-GITA and tested on enhanced e-PC-GITA
- SSL4PR HuBERT Base Full: using as base model HuBERT Base - trained on complete s-PC-GITA and tested on enhanced e-PC-GITA
All models are available in PyTorch format. ⚠️ Please note that the models are not directly compatible with the Hugging Face Transformers library because they are trained using specific head components (i.e., attention pooling, layer weighting...) as you can find in the model class
An image of the model architecture below:
Citation
@inproceedings{laquatra24_interspeech,
title = {Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions},
author = {Moreno {La Quatra} and Maria Francesca Turco and Torbjørn Svendsen and Giampiero Salvi and Juan Rafael Orozco-Arroyave and Sabato Marco Siniscalchi},
year = {2024},
booktitle = {Interspeech 2024},
pages = {1405--1409},
doi = {10.21437/Interspeech.2024-522},
issn = {2958-1796},
}