naymaraq commited on
Commit
02863c9
·
verified ·
1 Parent(s): 304eec6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -1
README.md CHANGED
@@ -25,7 +25,12 @@ tags:
25
 
26
  Frame-VAD Multilingual MarbleNet v2.0 is a convolutional neural network for voice activity detection (VAD) that serves as the first step for Speech Recognition and Speaker Diarization. It is a frame-based model that outputs a speech probability for each 20 millisecond frame of the input audio. The model has 91.5K parameters, making it lightweight and efficient for real-time applications. <br>
27
  To reduce false positive errors — cases where the model incorrectly detects speech when none is present — the model was trained with white noise and real-word noise perturbations. During training, the volume of audios was also varied. Additionally, the training data includes non-speech audio samples to help the model distinguish between speech and non-speech sounds (such as coughing, laughter, and breathing, etc.) <br>
28
- The model supports multiple languages, including Chinese, German, Russian, English, Spanish, and French.
 
 
 
 
 
29
 
30
  This model is ready for commercial use. <br>
31
 
 
25
 
26
  Frame-VAD Multilingual MarbleNet v2.0 is a convolutional neural network for voice activity detection (VAD) that serves as the first step for Speech Recognition and Speaker Diarization. It is a frame-based model that outputs a speech probability for each 20 millisecond frame of the input audio. The model has 91.5K parameters, making it lightweight and efficient for real-time applications. <br>
27
  To reduce false positive errors — cases where the model incorrectly detects speech when none is present — the model was trained with white noise and real-word noise perturbations. During training, the volume of audios was also varied. Additionally, the training data includes non-speech audio samples to help the model distinguish between speech and non-speech sounds (such as coughing, laughter, and breathing, etc.) <br>
28
+
29
+ **Key Features**
30
+ - Lightweight model with only 91.5K parameters
31
+ - Robust against false positive errors
32
+ - Outputs speech probability for each 20 ms audio frame
33
+ - Multilingual support: Chinese, German, Russian, English, Spanish, and French
34
 
35
  This model is ready for commercial use. <br>
36