README.md · diarizers-community/README at 117475e637c579d95f8d30a5263ee3dab07559ac

metadata

title: README
emoji: 🏃
colorFrom: indigo
colorTo: purple
sdk: static
pinned: false

diarizers-community aims to promote speaker diarization on the Hugging Face hub. It comes with diarizers, a library for fine-tuning pyannote speaker diarzaition models that is compatible with the Hugging Face ecosystem.

This organization contains:

A collection of multilingual speaker diarization datasets that are compatible with diarizers. They have been processed using diarizers scripts.

The currently available datasets are the CallHome (Japanese, Chinese, German, Spanish, English), the AMI Corpus (English), Vox-Converse (English) and Simsamu (French). We aim at adding more datasets in the future to support speaker diarization on the Hub.

A collection of 5 fine-tuned segmentation model baselines that can be used in a pyannote speaker diarization pipeline.
Each model has been fine-tuned on a specific language of the Callhome dataset. Compared to the pre-trained pyannote segmentation model, they obtain better performances on each language:

** ADD BENCHMARK **

Note: Results have been obtained using the test_segmentation.py script from diarizers.

Together with this organisation, we release:

The diarizers library, to fine-tune pyannote segmentation models and use them back in a pyannote speaker diarization pipeline.
A google colab notebook, whith a step-by-step guide on how to use diarizers.

Edit this README.md markdown file to author your organization card.