metadata

base_model: facebook/wav2vec2-large-xls-r-300m
datasets:
  - common_voice
language:
  - hi
  - mr
library_name: transformers
license: mit
metrics:
  - wer
  - cer
tags:
  - code-switching
  - ASR
  - multilingual
model-index:
  - name: wav2vec2-large-xls-r-300m-hindi_marathi-code-switching-experiment
    results:
      - task:
          type: automatic-speech-recognition
        dataset:
          name: common_voice
          type: audio
        metrics:
          - type: wer
            value: 0.28
            name: Word Error Rate (WER)
          - type: cer
            value: 0.24
            name: Character Error Rate (CER)
        source:
          url: >-
            https://huggingface.co/Hemantrao/wav2vec2-large-xls-r-300m-hindi_marathi-code-switching-experimentx1/
          name: Internal Evaluation

Enhanced Multilingual Code-Switched Speech Recognition for Low-Resource Languages Using Transformer-Based Models and Dynamic Switching Algorithms

Model description

This model is designed to handle code-switched speech in Hindi and Marathi using the wav2vec2-large-xls-r-300m transformer-based model. It leverages advanced techniques such as Q-Learning, SARSA, and Deep Q-Networks (DQN) for determining optimal switch points in code-switched speech.

Intended uses & limitations

Intended uses

Automatic speech recognition for multilingual environments involving Hindi and Marathi.
Research in multilingual ASR and code-switching phenomena.

Limitations

The model may exhibit biases inherent in the training data.
Potential limitations in accurately recognizing heavily accented or dialectal speech not covered in the training dataset.

Training params and experimental info

The model was fine-tuned using the following parameters:

Attention Dropout: 0.1
Hidden Dropout: 0.1
Feature Projection Dropout: 0.1
Layerdrop: 0.1
Learning Rate: 3e-4
Mask Time Probability: 0.05

Training dataset

The model was trained on the Common Voice dataset, which includes diverse speech samples in both Hindi and Marathi. The dataset was augmented with synthetically generated code-switched speech to improve the model's robustness in handling code-switching scenarios.

Evaluation results

The model achieved the following performance metrics on the test set:

Word Error Rate (WER): 0.2800
Character Error Rate (CER): 0.2400