ECAPA2 Speaker Embedding Extractor
Link to paper: ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings.
ECAPA2 is a hybrid neural network architecture and training strategy for generating robust speaker embeddings. The provided pre-trained model has an easy-to-use API to extract speaker embeddings and other hierarchical features. More information can be found in our original ECAPA2 paper.
Usage Guide
Download model
You need to install the huggingface_hub
package to download the ECAPA2 model:
pip install --upgrade huggingface_hub
Or with Conda:
conda install -c conda-forge huggingface_hub
Download model:
from huggingface_hub import hf_hub_download
# automatically checks for cached file, optionally set `cache_dir` location
model_file = hf_hub_download(repo_id='Jenthe/ECAPA2', filename='ecapa2.pt', cache_dir=None)
Speaker Embedding Extraction
Extracting speaker embeddings is easy and only requires a few lines of code:
import torch
import torchaudio
ecapa2 = torch.jit.load(model_file, map_location='cpu')
audio, sr = torchaudio.load('sample.wav') # sample rate of 16 kHz expected
embedding = ecapa2(audio)
For faster, 16-bit half-precision CUDA inference (recommended):
import torch
import torchaudio
ecapa2 = torch.jit.load(model_file, map_location='cuda')
ecapa2.half() # optional, but results in faster inference
audio, sr = torchaudio.load('sample.wav') # sample rate of 16 kHz expected
embedding = ecapa2(audio)
The initial calls to the JIT-model can in some cases take a very long time because of optimization attempts of the compiler. If you have issues, the JIT-optimizer can be disabled as following:
with torch.jit.optimized_execution(False):
embedding = ecapa2(audio)
There is no need for ecapa2.eval()
or torch.no_grad()
, this is done automatically.
Citation
BibTeX:
@INPROCEEDINGS{ecapa2,
author={Jenthe Thienpondt and Kris Demuynck},
booktitle={2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
title={ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings},
year={2023},
volume={},
number={}
}
APA:
Jenthe Thienpondt, Kris Demuynck (2023). ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
Contact
Name: Jenthe Thienpondt
E-mail: [email protected]