Urdu-ASR-SOTA / README.md
kingabzpro's picture
Update README.md
2665c6b

A newer version of the Gradio SDK is available: 5.4.0

Upgrade
metadata
title: Urdu ASR SOTA
emoji: 👨‍🎤
colorFrom: green
colorTo: blue
sdk: gradio
app_file: Gradio/app.py
pinned: true
license: apache-2.0

Urdu Automatic Speech Recognition State of the Art Solution

cover Automatic Speech Recognition using Facebook's wav2vec2-xls-r-300m model and mozilla-foundation common_voice_8_0 Urdu Dataset.

Model Finetunning

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset.

It achieves the following results on the evaluation set:

  • Loss: 0.9889
  • Wer: 0.5607
  • Cer: 0.2370

Quick Prediction

Install all dependecies using requirment.txt file and then run bellow command to predict the text:

import torch
from datasets import load_dataset, Audio
from transformers import pipeline
model = "Model"
data = load_dataset("Data", "ur", split="test", delimiter="\t")
def path_adjust(batch):
    batch["path"] = "Data/ur/clips/" + str(batch["path"])
    return batch
data = data.map(path_adjust)
sample_iter = iter(data.cast_column("path", Audio(sampling_rate=16_000)))
sample = next(sample_iter)

asr = pipeline("automatic-speech-recognition", model=model)
prediction = asr(
            sample["path"]["array"], chunk_length_s=5, stride_length_s=1)
prediction
# => {'text': 'اب یہ ونگین لمحاتانکھار دلمیں میںفوث کریلیا اجائ'}

Evaluation Commands

To evaluate on mozilla-foundation/common_voice_8_0 with split test, you can copy and past the command to the terminal.

python3 eval.py --model_id Model --dataset Data --config ur --split test --chunk_length_s 5.0 --stride_length_s 1.0 --log_outputs

OR Run the simple shell script

bash run_eval.sh

Language Model

Boosting Wav2Vec2 with n-grams in 🤗 Transformers

  • Get suitable Urdu text data for a language model
  • Build an n-gram with KenLM
  • Combine the n-gram with a fine-tuned Wav2Vec2 checkpoint

Install kenlm and pyctcdecode before running the notebook.

pip install https://github.com/kpu/kenlm/archive/master.zip pyctcdecode

Eval Results

Without LM With LM
56.21 46.37

Directory Structure

<root directory>
    |
    .- README.md
    |
    .- Data/
    |
    .- Model/
    |
    .- Images/
    |
    .- Sample/
    |
    .- Gradio/
    |
    .- Eval Results/
          |
          .- With LM/
          |
          .- Without LM/
          | ...
    .- notebook.ipynb
    |
    .- run_eval.sh
    |
    .- eval.py

Gradio App

SOTA

  • Add Language Model
  • Webapp/API
  • [] Denoise Audio
  • [] Text Processing
  • [] Spelling Mistakes
  • Hyperparameters optimization
  • [] Training on 300 Epochs & 64 Batch Size
  • [] Improved Language Model
  • [] Contribute to Urdu ASR Audio Dataset

Robust Speech Recognition Challenge 2022

This project was the results of HuggingFace Robust Speech Recognition Challenge. I was one of the winner with four state of the art ASR model. Check out my SOTA checkpoints.

winner

References