File size: 4,275 Bytes
04d7292
 
 
 
 
 
 
 
 
83407f3
59a0a12
98ad326
40c310c
33bd222
40c310c
34e74b1
117475e
34e74b1
2cbde18
 
 
34e74b1
2cbde18
 
c09d436
 
fd1f9d7
2cbde18
65f47af
f20db24
 
 
 
 
 
 
 
 
 
 
 
98ad326
117475e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
title: README
emoji: πŸƒ
colorFrom: indigo
colorTo: purple
sdk: static
pinned: false
---

[diarizers-community](https://huggingface.co/diarizers-community) aims to promote speaker diarization on the Hugging Face hub. It contains: 

- A collection of [multilingual speaker diarization datasets](https://huggingface.co/collections/diarizers-community/speaker-diarization-datasets-66261b8d571552066e003788) that are compatible with the [diarizers](https://github.com/huggingface/diarizers) library. They have been processed using [diarizers scripts](https://github.com/huggingface/diarizers/blob/main/datasets/README.md).

The available datasets are the CallHome (Japanese, Chinese, German, Spanish, English), AMI Corpus (English), Vox-Converse (English) and Simsamu (French). We aim to add more datasets in the future to better support speaker diarization on the Hub.

- A collection of multilingual [fine-tuned segmentation model](https://huggingface.co/collections/diarizers-community/models-66261d0f9277b825c807ff2a) baselines compatible with [pyannote](https://github.com/pyannote/pyannote-audio).

Each model has been fine-tuned on a specific Callhome language subset. They achieve better performances on multilingual data compared to [pyannote](https://github.com/pyannote/pyannote-audio)'s pre-trained [segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) model (see benchmark for more details on model performance). 

Together with diarizers-community, we release: 

- [diarizers](https://github.com/huggingface/diarizers/tree/main), a library for fine-tuning [pyannote](https://github.com/pyannote/pyannote-audio) speaker diarization models using the Hugging Face ecosystem.

- A google colab [notebook](https://colab.research.google.com/github/kamilakesbi/notebooks/blob/main/fine_tune_pyannote.ipynb), with a step-by-step guide on how to use diarizers. 


**Benchmark** 

| [Callhome](https://huggingface.co/datasets/diarizers-community/callhome) test dataset | Model    |     DER       |  False alarm  | Missed detection|   Confusion   | 
| ------------------------|                                                                ------------- | ------------- | ------------- | --------------- | ------------- |
|              Japanese   | [Pretrained](https://huggingface.co/pyannote/segmentation-3.0)               |     25.44     |      **2.30** |      17.45      |     5.69      |
|                         | [Fine-tuned](https://huggingface.co/diarizers-community/speaker-segmentation-fine-tuned-callhome-jpn)    |    **18.23**  |      6.31     |      **6.91**   |     **5.01**  |
|          Spanish        | [Pretrained](https://huggingface.co/pyannote/segmentation-3.0)               |     33.44     |      **2.59** |      25.19      |  **5.66**     | 
|                         | [Fine-tuned](https://huggingface.co/diarizers-community/speaker-segmentation-fine-tuned-callhome-spa)    |     **25.72** |      6.87     |      **12.73**  | 6.12          |
|          English        | [Pretrained](https://huggingface.co/pyannote/segmentation-3.0)               |  22.16        |      **6.29** |       10.97     | 4.90          | 
|                         | [Fine-tuned](https://huggingface.co/diarizers-community/speaker-segmentation-fine-tuned-callhome-eng)    | **18.40**     | 7.10          |       **6.98**  | **4.32**      |
|          German         | [Pretrained](https://huggingface.co/pyannote/segmentation-3.0)               |    21.90      | **3.10**      |  14.25          | 4.55          |
|                         | [Fine-tuned](https://huggingface.co/diarizers-community/speaker-segmentation-fine-tuned-callhome-deu)    | **16.75**     | 5.00          |  **7.75**       | **4.00**      |
|          Chinese        | [Pretrained](https://huggingface.co/pyannote/segmentation-3.0)               | 19.73         | **4.81**      | 9.82            | 5.11          |   
|                         | [Fine-tuned](https://huggingface.co/diarizers-community/speaker-segmentation-fine-tuned-callhome-zho)    | **15.95**     | 5.04          | **7.24**        | **3.68**      |

Results are in %. They have been obtained using the [test script](https://github.com/huggingface/diarizers/blob/main/test_segmentation.py) from diarizers.