Age Estimation Model
This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an ANN regressor to predict speaker age from audio input. The model was trained on the TIMIT dataset.
Model Performance Comparison
We provide multiple pre-trained models with different architectures and feature sets. Here's a comprehensive comparison of their performance:
Model | Architecture | Features | Training Data | Test MAE | Best For |
---|---|---|---|---|---|
VoxCeleb2 SVR (223) | SVR | ECAPA + Librosa (223-dim) | VoxCeleb2 | 7.88 years | Best performance on VoxCeleb2 |
VoxCeleb2 SVR (192) | SVR | ECAPA only (192-dim) | VoxCeleb2 | 7.89 years | Lightweight deployment |
TIMIT ANN (192) | ANN | ECAPA only (192-dim) | TIMIT | 4.95 years | Clean studio recordings |
Combined ANN (223) | ANN | ECAPA + Librosa (223-dim) | VoxCeleb2 + TIMIT | 6.93 years | Best general performance |
You may find other models here.
Model Details
- Input: Audio file (will be converted to 16kHz, mono, single channel)
- Output: Predicted age in years (continuous value)
- Features: SpeechBrain ECAPA-TDNN embedding [192 features]
- Regressor: Artificial Neural Network optimized through Optuna
- Performance:
- TIMIT test set: 4.95 years Mean Absolute Error (MAE)
Features
- SpeechBrain ECAPA-TDNN embeddings (192 dimensions)
Training Data
The model was trained on the TIMIT dataset:
- High-quality studio recordings
- Single channel, 16kHz sampling rate
- Carefully controlled recording conditions
- Age annotations provided in the original dataset
Installation
pip install git+https://github.com/griko/voice-age-regression.git#egg=voice-age-regressor[ann-ecapa-timit]
Usage
from age_regressor import AgeRegressionPipeline
# Load the pipeline
regressor = AgeRegressionPipeline.from_pretrained(
"griko/age_reg_ann_ecapa_timit"
)
# Single file prediction
result = regressor("path/to/audio.wav")
print(f"Predicted age: {result[0]:.1f} years")
# Batch prediction
results = regressor(["audio1.wav", "audio2.wav"])
print(f"Predicted ages: {[f'{age:.1f}' for age in results]} years")
Limitations
- Model was trained on carefully controlled studio recordings recordings
- Performance may vary on different audio qualities or recording conditions
- Age predictions are estimates and should not be used for medical or legal purposes
- Age estimations should be treated as approximate values, not exact measurements
Citation
If you use this model in your research, please cite:
@misc{koushnir2025vanpyvoiceanalysisframework,
title={VANPY: Voice Analysis Framework},
author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
year={2025},
eprint={2502.17579},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2502.17579},
}
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.