Hemantrao's picture
Upload Wav2Vec2ForCTC
4b8d55f verified
metadata
base_model: facebook/wav2vec2-large-xls-r-300m
datasets:
  - common_voice
language:
  - hi
  - mr
library_name: transformers
license: mit
metrics:
  - wer
  - cer
tags:
  - code-switching
  - ASR
  - multilingual
model-index:
  - name: wav2vec2-large-xls-r-300m-hindi_marathi-code-switching-experiment
    results:
      - task:
          type: automatic-speech-recognition
        dataset:
          name: common_voice
          type: audio
        metrics:
          - type: wer
            value: 0.28
            name: Word Error Rate (WER)
          - type: cer
            value: 0.24
            name: Character Error Rate (CER)
        source:
          url: >-
            https://huggingface.co/Hemantrao/wav2vec2-large-xls-r-300m-hindi_marathi-code-switching-experimentx1/
          name: Internal Evaluation

Enhanced Multilingual Code-Switched Speech Recognition for Low-Resource Languages Using Transformer-Based Models and Dynamic Switching Algorithms

Model description

This model is designed to handle code-switched speech in Hindi and Marathi using the wav2vec2-large-xls-r-300m transformer-based model. It leverages advanced techniques such as Q-Learning, SARSA, and Deep Q-Networks (DQN) for determining optimal switch points in code-switched speech.

Intended uses & limitations

Intended uses

  • Automatic speech recognition for multilingual environments involving Hindi and Marathi.
  • Research in multilingual ASR and code-switching phenomena.

Limitations

  • The model may exhibit biases inherent in the training data.
  • Potential limitations in accurately recognizing heavily accented or dialectal speech not covered in the training dataset.

Training params and experimental info

The model was fine-tuned using the following parameters:

  • Attention Dropout: 0.1
  • Hidden Dropout: 0.1
  • Feature Projection Dropout: 0.1
  • Layerdrop: 0.1
  • Learning Rate: 3e-4
  • Mask Time Probability: 0.05

Training dataset

The model was trained on the Common Voice dataset, which includes diverse speech samples in both Hindi and Marathi. The dataset was augmented with synthetically generated code-switched speech to improve the model's robustness in handling code-switching scenarios.

Evaluation results

The model achieved the following performance metrics on the test set:

  • Word Error Rate (WER): 0.2800
  • Character Error Rate (CER): 0.2400