|
--- |
|
license: apache-2.0 |
|
metrics: |
|
- accuracy |
|
language: |
|
- en |
|
- zh |
|
- ko |
|
- ja |
|
- de |
|
- fr |
|
- es |
|
- pt |
|
- vi |
|
- tr |
|
- it |
|
- ru |
|
- id |
|
tags: |
|
- keras |
|
- tensorflow |
|
libraries: TensorBoard |
|
pipeline_tag: audio-classification |
|
--- |
|
|
|
# Spoken_language_identification |
|
|
|
## Model description |
|
|
|
This is a spoken language recognition model trained on 2k hours of private dataset using Tensorflow. Approximately 150 hours of speech supervision per language. |
|
the model uses the CRNN-Attention architecture that has previously been used for extracting utterance-level feature representations. |
|
The system is trained with recordings sampled at 16kHz, single channel, and 16-bit Signed Integer PCM encoding. |
|
|
|
More details can be found here: [**GitHub**](https://github.com/SpeechFlow-io/Spoken_language_identification) |
|
|
|
The model can classify a speech utterance according to the language spoken. It covers 13 different languages. |
|
|
|
| Molde Parameters | Supported Languages | |
|
|----------|--------------------------| |
|
| 1 M | chinese, english, french, german, indonesian, italian, japanese, korean, portuguese, russian, spanish, turkish, vietnamese| |
|
|
|
## Example |
|
[![ Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/16-Nre8aDvn0wN2dsgGa3xUsZ7S61e1h8#scrollTo=Is60zUMuPqSi) |
|
Please see the provided Colab for details for runing an example. |
|
#### How to use |
|
|
|
```python |
|
|
|
import librosa |
|
from huggingface_hub import from_pretrained_keras |
|
from featurizers.speech_featurizers import TFSpeechFeaturizer, |
|
model = from_pretrained_keras("SpeechFlow/spoken_language_identification") |
|
signal, _ = librosa.load(wav_path, sr=16000) |
|
output, prob = model.predict_pb(signal) |
|
print(output) |
|
|
|
``` |