Age Estimation Model

This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an ANN regressor to predict speaker age from audio input. The model was trained on the TIMIT dataset.

Model Performance Comparison

We provide multiple pre-trained models with different architectures and feature sets. Here's a comprehensive comparison of their performance:

Model Architecture Features Training Data Test MAE Best For
VoxCeleb2 SVR (223) SVR ECAPA + Librosa (223-dim) VoxCeleb2 7.88 years Best performance on VoxCeleb2
VoxCeleb2 SVR (192) SVR ECAPA only (192-dim) VoxCeleb2 7.89 years Lightweight deployment
TIMIT ANN (192) ANN ECAPA only (192-dim) TIMIT 4.95 years Clean studio recordings
Combined ANN (223) ANN ECAPA + Librosa (223-dim) VoxCeleb2 + TIMIT 6.93 years Best general performance

You may find other models here.

Model Details

  • Input: Audio file (will be converted to 16kHz, mono, single channel)
  • Output: Predicted age in years (continuous value)
  • Features: SpeechBrain ECAPA-TDNN embedding [192 features]
  • Regressor: Artificial Neural Network optimized through Optuna
  • Performance:
    • TIMIT test set: 4.95 years Mean Absolute Error (MAE)

Features

  1. SpeechBrain ECAPA-TDNN embeddings (192 dimensions)

Training Data

The model was trained on the TIMIT dataset:

  • High-quality studio recordings
  • Single channel, 16kHz sampling rate
  • Carefully controlled recording conditions
  • Age annotations provided in the original dataset

Installation

pip install git+https://github.com/griko/voice-age-regression.git#egg=voice-age-regressor[ann-ecapa-timit]

Usage

from age_regressor import AgeRegressionPipeline

# Load the pipeline
regressor = AgeRegressionPipeline.from_pretrained(
    "griko/age_reg_ann_ecapa_timit"
)

# Single file prediction
result = regressor("path/to/audio.wav")
print(f"Predicted age: {result[0]:.1f} years")

# Batch prediction
results = regressor(["audio1.wav", "audio2.wav"])
print(f"Predicted ages: {[f'{age:.1f}' for age in results]} years")

Limitations

  • Model was trained on carefully controlled studio recordings recordings
  • Performance may vary on different audio qualities or recording conditions
  • Age predictions are estimates and should not be used for medical or legal purposes
  • Age estimations should be treated as approximate values, not exact measurements

Citation

If you use this model in your research, please cite:

@misc{koushnir2025vanpyvoiceanalysisframework,
      title={VANPY: Voice Analysis Framework}, 
      author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
      year={2025},
      eprint={2502.17579},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2502.17579}, 
}
Downloads last month
0
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.