Spaces:
Runtime error
Runtime error
metadata
title: Denoise And Diarization
emoji: 🐠
colorFrom: gray
colorTo: gray
sdk: gradio
sdk_version: 3.28.0
app_file: app.py
pinned: false
How inference:
- huggingface
- run local inference:
- GUI:
python app.py
- Inference local:
python main_pipeline.py --audio-path dialog.mp3 --out-folder-path out
- GUI:
About pipeline:
- denoise audio
- vad(voice activity detector)
- speaker embeddings from each vad fragments
- clustering this embeddings
Inference for hardware
inference time for file dialog.mp3 | |
---|---|
cpu 2v CPU huggingece | 453.8 s/it |
gpu tesla v100 | 8.23 s/it |
I know a lot of methods for this task:
- separation: using separation models(need longtime train and finetune)
- diarization
- speaker_embedding+Clustering knowing numbers of speakers
- overlap speech detection
- speaker_embedding+Clustering knowing numbers of speakers
- asr_each_word+speaker_embedding+Clustering numbers of speakers
- end-to-end nn diarization (sota worst than just diarization)
For this task i used speaker_embedding+Clustering unknowing numbers of speakers
How i can improve (i have experience in it):
- preprocessing
- estimate SNR(signal noise rate) and if input clean dont use denoising
- train:
- custom speaker recognition model
- custom overlap speech detector
- custom speech separation model:
- Using FaceVad if there are video
- improve speed and ram size:
- quantization models
- optimate models for hardware onnx=>openvino/tensorrt/caffe2 or coreml
- pruning models
- distillation(train small model with big model)
How to improve besides what's on top:
- delete overlap speech using asr
- delete overlap speech using overlap detection