Spoken_language_identification
Model description
This is a spoken language recognition model trained on 2k hours of private dataset using Tensorflow. Approximately 150 hours of speech supervision per language. the model uses the CRNN-Attention architecture that has previously been used for extracting utterance-level feature representations. The system is trained with recordings sampled at 16kHz, single channel, and 16-bit Signed Integer PCM encoding.
More details can be found here: GitHub
The model can classify a speech utterance according to the language spoken. It covers 13 different languages.
Molde Parameters | Supported Languages |
---|---|
1 M | chinese, english, french, german, indonesian, italian, japanese, korean, portuguese, russian, spanish, turkish, vietnamese |
Example
Please see the provided Colab for details for runing an example.
How to use
import librosa
from huggingface_hub import from_pretrained_keras
from featurizers.speech_featurizers import TFSpeechFeaturizer,
model = from_pretrained_keras("SpeechFlow/spoken_language_identification")
signal, _ = librosa.load(wav_path, sr=16000)
output, prob = model.predict_pb(signal)
print(output)
- Downloads last month
- 4
Inference API (serverless) does not yet support tf-keras models for this pipeline type.