Spaces:
Runtime error
A newer version of the Gradio SDK is available:
5.23.3
Speaker Diarization with OpenVINO™
Speaker diarization is the process of partitioning an audio stream containing human speech into homogeneous segments according to the identity of each speaker. It can enhance the readability of an automatic speech transcription by structuring the audio stream into speaker turns and, when used together with speaker recognition systems, by providing the speaker’s true identity. It is used to answer the question "who spoke when?". Speaker diarization is an essential feature for a speech recognition system to enrich the transcription with speaker labels.
In this tutorial, we consider how to build speaker diarization pipeline using pyannote.audio
and OpenVINO. pyannote.audio
is an open-source toolkit written in Python for speaker diarization. Based on PyTorch deep learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. You can find more information about pyannote pre-trained models in model card, repo and paper.
Notebook Contents
This tutorial uses the pyannote/speaker-diarization pipeline and demonstrates how to integrate OpenVINO for running model inference inside pipeline.
The pipeline accepts input audio file and provides mapping between timestamps and a speaker, to which the corresponding speech fragment belongs. Below is an example of output time map produced by the demo.
Installation Instructions
This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to Installation Guide.