metadata

license: apache-2.0
metrics:
  - accuracy
language:
  - en
  - zh
  - ko
  - ja
  - de
  - fr
  - es
  - pt
  - vi
  - tr
  - it
  - ru
  - id
tags:
  - keras
  - tensorflow
  - image-classification
library_name: transformers
libraries: TensorBoard
widget:
  - example_title: English Sample
    src: >-
      https://huggingface.co/SpeechFlow/spoken_language_identification/blob/main/test_audios/english.wav
pipeline_tag: audio-classification

Spoken_language_identification

Model description

This is a spoken language recognition model trained on private dataset using Tensorflow. the model uses the CRNN-Attention architecture that has previously been used for extracting utterance-level feature representations.

The system is trained with recordings sampled at 16kHz, single channel, and 16-bit Signed Integer PCM encoding.

The model can classify a speech utterance according to the language spoken. It covers 13 different languages( chinese english french german indonesian italian japanese korean portuguese russian spanish turkish vietnamese )

Intended uses & Limitations

How to use


import librosa
from huggingface_hub import from_pretrained_keras
from featurizers.speech_featurizers import TFSpeechFeaturizer,
model = from_pretrained_keras("SpeechFlow/spoken_language_identification")
signal, _ = librosa.load(wav_path, sr=16000)
output, prob = model.predict_pb(signal)
print(output)