--- title: README emoji: 🏃 colorFrom: indigo colorTo: purple sdk: static pinned: false --- [diarizers-community](https://huggingface.co/diarizers-community) aims to promote speaker diarization on the Hugging Face hub. It contains: - A collection of [multilingual speaker diarization datasets](https://huggingface.co/collections/diarizers-community/speaker-diarization-datasets-66261b8d571552066e003788) that are compatible with the [diarizers](https://github.com/kamilakesbi/diarizers) library. They have been processed using [diarizers scripts](https://github.com/kamilakesbi/diarizers/blob/main/datasets/README.md). The currently available datasets are the CallHome (Japanese, Chinese, German, Spanish, English), the AMI Corpus (English), Vox-Converse (English) and Simsamu (French). We aim to add more datasets in the future to better support speaker diarising on the Hub. - A collection of [5 fine-tuned segmentation model](https://huggingface.co/collections/diarizers-community/models-66261d0f9277b825c807ff2a) baselines that can be used in a pyannote speaker diarization pipeline. Each model has been fine-tuned on a specific language of the Callhome dataset. Compared to the pre-trained pyannote [segmentation model](https://huggingface.co/pyannote/segmentation-3.0), they achieve better performances on multlingual data: ** ADD BENCHMARK ** Note: Results have been obtained using [test scripts](https://github.com/kamilakesbi/diarizers/blob/main/test_segmentation.py) from diarizers. diarizers-community comes with: - [diarizers](https://github.com/kamilakesbi/diarizers/tree/main) is a library for fine-tuning pyannote speaker diarization models using the Hugging Face ecosystem. - A google colab [notebook](https://colab.research.google.com/github/kamilakesbi/notebooks/blob/main/fine_tune_pyannote.ipynb), with a step-by-step guide on how to use diarizers. Edit this `README.md` markdown file to author your organization card.