metadata

license: mit
inference: true
language:
  - en
metrics:
  - cer
  - wer
base_model:
  - facebook/deit-base-patch16-224
  - ai4bharat/IndicBART
pipeline_tag: image-to-text
tags:
  - text-generation
  - scene-text-recognition
  - text-recognition
  - computer-vision
  - language-model

trocr-indic

This model utilizes the trocr approach to predict the Indic Texts from cropped_images.

Model Details

The model follows the TrOCR approach of training OCR for Scene Texts. Since, there is scarcity for generalized model for majority of Indian Languages, this model serves it replacement.

Courtesty: TrOCR - original paper

The model is trained for the following languages:

Assamese
Bengali
Gujarati
Hindi
Kannada
Malayalam
Marathi
Odia
Punjabi
Telugu
Tamil

Model Description

IMPORTANT Although the model is trained on these languages due to limitations of IndicBART, the model is trained with only Devnagiri Scripts.

The output is in the following format:

<LANGUAGE TOKEN> <TEXT TOKENS> <EOS TOKEN>

The following flowchart gives a better picture on the approach of training and inference regarding this model.

Datasets used: IndicSTR12
Developed by: Aarya Devarla
Model type: Visio-Lingual Model / Vision-Language Model
License: mit
Finetuned from model: deit, indicBART

Results

Metric	Assamese	Bengali	Gujarati	Hindi	Kannada	Malayalam	Marathi	Odia	Punjabi	Tamil	Telugu
CER	0.069	0.133	0.058	0.075	0.212	0.154	0.082	0.120	0.097	0.122	0.220
WER	0.205	0.395	0.192	0.283	0.576	0.519	0.312	0.375	0.304	0.409	0.612

Well, the model isn't perfect. But it's a start.

Limitations

The main limitation comes from IndicBART which is primarily trained on IndicTexts.

Recommendations

Since the TrOCR is modular in approach one can just swap out the IndicBART model and train it with new model. Must keep in mind about the preprocessing and outputs.