Enhanced Multilingual Code-Switched Speech Recognition for Low-Resource Languages Using Transformer-Based Models and Dynamic Switching Algorithms

Model description

This model is designed to handle code-switched speech in Hindi and Marathi using the wav2vec2-large-xls-r-300m transformer-based model. It leverages advanced techniques such as Q-Learning, SARSA, and Deep Q-Networks (DQN) for determining optimal switch points in code-switched speech.

Intended uses & limitations

Intended uses

  • Automatic speech recognition for multilingual environments involving Hindi and Marathi.
  • Research in multilingual ASR and code-switching phenomena.

Limitations

  • The model may exhibit biases inherent in the training data.
  • Potential limitations in accurately recognizing heavily accented or dialectal speech not covered in the training dataset.

Training params and experimental info

The model was fine-tuned using the following parameters:

  • Attention Dropout: 0.1
  • Hidden Dropout: 0.1
  • Feature Projection Dropout: 0.1
  • Layerdrop: 0.1
  • Learning Rate: 3e-4
  • Mask Time Probability: 0.05

Training dataset

The model was trained on the Common Voice dataset, which includes diverse speech samples in both Hindi and Marathi. The dataset was augmented with synthetically generated code-switched speech to improve the model's robustness in handling code-switching scenarios.

Evaluation results

The model achieved the following performance metrics on the test set:

  • Word Error Rate (WER): 0.2800
  • Character Error Rate (CER): 0.2400
Downloads last month
3
Safetensors
Model size
316M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Hemantrao/wav2vec2-large-xls-r-300m-hindi_marathi-code-switching-experimentx1

Evaluation results