Age Estimation Model

This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an ANN regressor to predict speaker age from audio input. The model was trained on the TIMIT dataset.

Model Performance Comparison

We provide multiple pre-trained models with different architectures and feature sets. Here's a comprehensive comparison of their performance:

Model	Architecture	Features	Training Data	Test MAE	Best For
VoxCeleb2 SVR (223)	SVR	ECAPA + Librosa (223-dim)	VoxCeleb2	7.88 years	Best performance on VoxCeleb2
VoxCeleb2 SVR (192)	SVR	ECAPA only (192-dim)	VoxCeleb2	7.89 years	Lightweight deployment
TIMIT ANN (192)	ANN	ECAPA only (192-dim)	TIMIT	4.95 years	Clean studio recordings
Combined ANN (223)	ANN	ECAPA + Librosa (223-dim)	VoxCeleb2 + TIMIT	6.93 years	Best general performance

You may find other models here.

Model Details

Input: Audio file (will be converted to 16kHz, mono, single channel)
Output: Predicted age in years (continuous value)
Features: SpeechBrain ECAPA-TDNN embedding [192 features]
Regressor: Artificial Neural Network optimized through Optuna
Performance:
- TIMIT test set: 4.95 years Mean Absolute Error (MAE)

Features

SpeechBrain ECAPA-TDNN embeddings (192 dimensions)

Training Data

The model was trained on the TIMIT dataset:

High-quality studio recordings
Single channel, 16kHz sampling rate
Carefully controlled recording conditions
Age annotations provided in the original dataset

Installation

pip install git+https://github.com/griko/voice-age-regression.git#egg=voice-age-regressor[ann-ecapa-timit]

Usage

from age_regressor import AgeRegressionPipeline

# Load the pipeline
regressor = AgeRegressionPipeline.from_pretrained(
    "griko/age_reg_ann_ecapa_timit"
)

# Single file prediction
result = regressor("path/to/audio.wav")
print(f"Predicted age: {result[0]:.1f} years")

# Batch prediction
results = regressor(["audio1.wav", "audio2.wav"])
print(f"Predicted ages: {[f'{age:.1f}' for age in results]} years")

Limitations

Model was trained on carefully controlled studio recordings recordings
Performance may vary on different audio qualities or recording conditions
Age predictions are estimates and should not be used for medical or legal purposes
Age estimations should be treated as approximate values, not exact measurements

Citation

If you use this model in your research, please cite:

@misc{koushnir2025vanpyvoiceanalysisframework,
      title={VANPY: Voice Analysis Framework}, 
      author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
      year={2025},
      eprint={2502.17579},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2502.17579}, 
}