openthaigpt
/

thai-trocr

vision-encoder-decoder

image-text-to-text

Inference Endpoints

Model card Files Files and versions Community

suchut commited on Sep 30

Commit

ca95332

•

1 Parent(s): ffdd4a7

Create README.md

Files changed (1) hide show

README.md +80 -0

README.md ADDED Viewed

	@@ -0,0 +1,80 @@

+---
+language:
+- th
+- en
+metrics:
+- cer
+tags:
+- trocr
+- image-to-text
+pipeline_tag: image-to-text
+library_name: transformers
+license: apache-2.0
+---
+# Thai-TrOCR Model
+## Introduction
+ThaiTrOCR is a fine-tuned version of the [TrOCR base handwritten model](https://huggingface.co/microsoft/trocr-base-handwritten), specifically crafted for Optical Character Recognition (OCR) in both Thai and English. This multilingual model adeptly processes handwritten text-line images in both languages, leveraging the TrOCR architecture, which combines a Vision Transformer encoder with an Electra-based text decoder. Designed to be compact and lightweight, ThaiTrOCR is optimized for efficient deployment in resource-constrained environments while achieving high accuracy in character recognition.
+- **Encoder**: TrOCR Base Handwritten
+- **Decoder**: Electra Small (Trained with Thai corpus)
+## Training Dataset
+- pythainlp/thai-wiki-dataset-v3
+- pythainlp/thaigov-corpus
+- Salesforce/wikitext
+## How to Use
+Here’s how to use this model in PyTorch:
+```python
+from transformers import TrOCRProcessor, VisionEncoderDecoderModel
+from PIL import Image
+import requests
+# Load processor and model
+processor = TrOCRProcessor.from_pretrained('openthaigpt/thai-trocr')
+model = VisionEncoderDecoderModel.from_pretrained('openthaigpt/thai-trocr')
+# Load an image
+url = 'your_image_url_here'
+image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
+# Process and generate text
+pixel_values = processor(images=image, return_tensors="pt").pixel_values
+generated_ids = model.generate(pixel_values)
+generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(generated_text)
+```
+## Model Performance Comparison
+The table below summarizes the performance metrics of various models across different document types, based on the adjusted mean score:
+| Document Type         | ThaiTrOCR | EasyOCR | Tesseract |
+|:----------------------|---------:|--------:|---------:|
+| Handwritten           | **0.190034** | 0.410738 | 1.032375 |
+| PDF Document          | **0.057597** | 0.085937 | 0.761595 |
+| PDF Document (EN-TH)  | **0.053968** | 0.308075 | 1.061107 |
+| Real Document         | **0.147440** | 0.293482 | 0.915707 |
+| Scene Text            | **0.134182** | 0.390583 | 2.408704 |
+| **Adjusted Mean**     | **0.123600** | 0.298474 | 1.269101 |
+**Notes**
+- The CER metric indicates that lower scores reflect better performance.
+- Tesseract supports only one language at a time; this benchmark uses only Thai.
+- Benchmarking was performed on a Google Colab CPU task.
+- The evaluation dataset is sourced from the openthaigpt/thai-ocr-evaluation.
+## Sponsors
+<img src="https://cdn-uploads.huggingface.co/production/uploads/66f6b837fbc158f2846a9108/WpQSD00FCtYjYlQXwMrDM.png" alt="Sponsors" width="500">
+## Authors
+- Suchut Sapsathien ([email protected])
+- Jillaphat Jaroenkantasima ([email protected])