whisper-small-ta / README.md
Lingalingeswaran's picture
Update README.md
365a4d6 verified
|
raw
history blame
2.25 kB
metadata
license: mit
datasets:
  - mozilla-foundation/common_voice_17_0
language:
  - en
  - ta
metrics:
  - wer
base_model:
  - openai/whisper-small
pipeline_tag: automatic-speech-recognition
library_name: transformers
tags:
  - language-identification
  - speech-to-text

Whisper-small-ta

This model is trainned for voice to text trancription for tamil language

Model Overview

This model is fine-tuned from openai/whisper-small using the Mozilla Common Voice 17.0 dataset for language identification and transcription in Tamil . The model is designed to accurately transcribe spoken audio into text and identify whether the language is Tamil .

Key Features:

  • Languages: Tamil
  • Base Model: Whisper-small from OpenAI
  • Dataset: Mozilla Common Voice 17.0

Intended Use

The model is designed for automatic speech recognition (ASR) in Tamil, making it suitable for transcription and language identification in real-time applications.

Training Details

This model was fine-tuned using a subset of the Mozilla Common Voice dataset. The dataset contains '53,468 ' samples

Fine-tuning Process:

  • The fine-tuning was performed on Whisper-small, a smaller version of OpenAI's Whisper model, for reduced latency and higher accuracy for low-resource languages.
  • The model was trained for 2 epochs on a Google Colab Pro environment.

Performance

The model achieved a Word Error Rate (WER) of 34% , using a validation dataset with 8 hours of audio. We expect further improvements with continued training.

Usage

You can use this model with the following code:

from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch

model = WhisperForConditionalGeneration.from_pretrained("Lingalingeswaran/whisper-small-ta")
processor = WhisperProcessor.from_pretrained("Lingalingeswaran/whisper-small-ta")

# Example audio input
audio = "path_to_audio_file"

inputs = processor(audio, return_tensors="pt", padding="longest")
with torch.no_grad():
    predicted_ids = model.generate(inputs.input_ids)
    
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)