trocr-indic
This model utilizes the trocr approach to predict the Indic Texts from cropped_images.
Model Details
The model follows the TrOCR approach of training OCR for Scene Texts. Since, there is scarcity for generalized model for majority of Indian Languages, this model serves it replacement.
Courtesty: TrOCR - original paper
The model is trained for the following languages:
- Assamese
- Bengali
- Gujarati
- Hindi
- Kannada
- Malayalam
- Marathi
- Odia
- Punjabi
- Telugu
- Tamil
Model Description
IMPORTANT Although the model is trained on these languages due to limitations of IndicBART, the model is trained with only Devnagiri Scripts.
The output is in the following format:
<LANGUAGE TOKEN> <TEXT TOKENS> <EOS TOKEN>
The following flowchart gives a better picture on the approach of training and inference regarding this model.
- Datasets used: IndicSTR12
- Developed by: Aarya Devarla
- Model type: Visio-Lingual Model / Vision-Language Model
- License: mit
- Finetuned from model: deit, indicBART
Results
Metric | Assamese | Bengali | Gujarati | Hindi | Kannada | Malayalam | Marathi | Odia | Punjabi | Tamil | Telugu |
---|---|---|---|---|---|---|---|---|---|---|---|
CER | 0.069 | 0.133 | 0.058 | 0.075 | 0.212 | 0.154 | 0.082 | 0.120 | 0.097 | 0.122 | 0.220 |
WER | 0.205 | 0.395 | 0.192 | 0.283 | 0.576 | 0.519 | 0.312 | 0.375 | 0.304 | 0.409 | 0.612 |
Well, the model isn't perfect. But it's a start.
Limitations
The main limitation comes from IndicBART which is primarily trained on IndicTexts.
Recommendations
Since the TrOCR is modular in approach one can just swap out the IndicBART model and train it with new model. Must keep in mind about the preprocessing and outputs.
- Downloads last month
- 91
Model tree for QuickHawk/trocr-indic
Base model
ai4bharat/IndicBART