Age Estimation Model

This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVR regressor to predict speaker age from audio input. The model uses ECAPA embeddings and Librosa acoustic features, trained on the VoxCeleb2 dataset.

Model Performance Comparison

We provide multiple pre-trained models with different architectures and feature sets. Here's a comprehensive comparison of their performance:

Model Architecture Features Training Data Test MAE Best For
VoxCeleb2 SVR (223) SVR ECAPA + Librosa (223-dim) VoxCeleb2 7.88 years Best performance on VoxCeleb2
VoxCeleb2 SVR (192) SVR ECAPA only (192-dim) VoxCeleb2 7.89 years Lightweight deployment
TIMIT ANN (192) ANN ECAPA only (192-dim) TIMIT 4.95 years Clean studio recordings
Combined ANN (223) ANN ECAPA + Librosa (223-dim) VoxCeleb2 + TIMIT 6.93 years Best general performance

You may find other models here.

Model Details

  • Input: Audio file (will be converted to 16kHz, mono, single channel)
  • Output: Predicted age in years (continuous value)
  • Features:
    • SpeechBrain ECAPA-TDNN embedding [192 features]
    • Additional Librosa features [31 features]
  • Regressor: Support Vector Regression optimized through Optuna
  • Performance:
    • VoxCeleb2 test set: 7.88 years Mean Absolute Error (MAE)

Features

  1. SpeechBrain ECAPA-TDNN embeddings (192 dimensions)
  2. Librosa acoustic features (31 dimensions):
    • 13 MFCCs
    • 13 Delta MFCCs
    • Zero crossing rate
    • Spectral centroid
    • Spectral bandwidth
    • Spectral contrast
    • Spectral flatness

Training Data

The model was trained on the VoxCeleb2 dataset:

  • Audio preprocessing:
    • Converted to WAV format, single channel, 16kHz sampling rate
    • Applied SileroVAD for voice activity detection, taking the first voiced segment
  • Age data was collected from Wikidata and public sources

Installation

pip install git+https://github.com/griko/voice-age-regression.git#egg=voice-age-regressor[svr-ecapa-librosa-voxceleb2]

Usage

from age_regressor import AgeRegressionPipeline

# Load the pipeline
regressor = AgeRegressionPipeline.from_pretrained(
    "griko/age_reg_svr_ecapa_librosa_voxceleb2"
)

# Single file prediction
result = regressor("path/to/audio.wav")
print(f"Predicted age: {result[0]:.1f} years")

# Batch prediction
results = regressor(["audio1.wav", "audio2.wav"])
print(f"Predicted ages: {[f'{age:.1f}' for age in results]} years")

Limitations

  • Model was trained on celebrity voices from YouTube interviews recordings
  • Performance may vary on different audio qualities or recording conditions
  • Age predictions are estimates and should not be used for medical or legal purposes
  • Age estimations should be treated as approximate values, not exact measurements

Citation

If you use this model in your research, please cite:

@misc{koushnir2025vanpyvoiceanalysisframework,
      title={VANPY: Voice Analysis Framework}, 
      author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
      year={2025},
      eprint={2502.17579},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2502.17579}, 
}
Downloads last month
9
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.