suchut commited on
Commit
ca95332
1 Parent(s): ffdd4a7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - th
4
+ - en
5
+ metrics:
6
+ - cer
7
+ tags:
8
+ - trocr
9
+ - image-to-text
10
+ pipeline_tag: image-to-text
11
+ library_name: transformers
12
+ license: apache-2.0
13
+ ---
14
+ # Thai-TrOCR Model
15
+
16
+ ## Introduction
17
+
18
+ ThaiTrOCR is a fine-tuned version of the [TrOCR base handwritten model](https://huggingface.co/microsoft/trocr-base-handwritten), specifically crafted for Optical Character Recognition (OCR) in both Thai and English. This multilingual model adeptly processes handwritten text-line images in both languages, leveraging the TrOCR architecture, which combines a Vision Transformer encoder with an Electra-based text decoder. Designed to be compact and lightweight, ThaiTrOCR is optimized for efficient deployment in resource-constrained environments while achieving high accuracy in character recognition.
19
+
20
+ - **Encoder**: TrOCR Base Handwritten
21
+ - **Decoder**: Electra Small (Trained with Thai corpus)
22
+
23
+ ## Training Dataset
24
+
25
+ - pythainlp/thai-wiki-dataset-v3
26
+ - pythainlp/thaigov-corpus
27
+ - Salesforce/wikitext
28
+
29
+ ## How to Use
30
+
31
+ Here’s how to use this model in PyTorch:
32
+
33
+ ```python
34
+ from transformers import TrOCRProcessor, VisionEncoderDecoderModel
35
+ from PIL import Image
36
+ import requests
37
+
38
+ # Load processor and model
39
+ processor = TrOCRProcessor.from_pretrained('openthaigpt/thai-trocr')
40
+ model = VisionEncoderDecoderModel.from_pretrained('openthaigpt/thai-trocr')
41
+
42
+ # Load an image
43
+ url = 'your_image_url_here'
44
+ image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
45
+
46
+ # Process and generate text
47
+ pixel_values = processor(images=image, return_tensors="pt").pixel_values
48
+ generated_ids = model.generate(pixel_values)
49
+ generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
50
+ print(generated_text)
51
+ ```
52
+
53
+ ## Model Performance Comparison
54
+
55
+ The table below summarizes the performance metrics of various models across different document types, based on the adjusted mean score:
56
+
57
+ | Document Type | ThaiTrOCR | EasyOCR | Tesseract |
58
+ |:----------------------|---------:|--------:|---------:|
59
+ | Handwritten | **0.190034** | 0.410738 | 1.032375 |
60
+ | PDF Document | **0.057597** | 0.085937 | 0.761595 |
61
+ | PDF Document (EN-TH) | **0.053968** | 0.308075 | 1.061107 |
62
+ | Real Document | **0.147440** | 0.293482 | 0.915707 |
63
+ | Scene Text | **0.134182** | 0.390583 | 2.408704 |
64
+ | **Adjusted Mean** | **0.123600** | 0.298474 | 1.269101 |
65
+
66
+ **Notes**
67
+
68
+ - The CER metric indicates that lower scores reflect better performance.
69
+ - Tesseract supports only one language at a time; this benchmark uses only Thai.
70
+ - Benchmarking was performed on a Google Colab CPU task.
71
+ - The evaluation dataset is sourced from the openthaigpt/thai-ocr-evaluation.
72
+
73
+ ## Sponsors
74
+
75
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/66f6b837fbc158f2846a9108/WpQSD00FCtYjYlQXwMrDM.png" alt="Sponsors" width="500">
76
+
77
+ ## Authors
78
+
79
+ - Suchut Sapsathien ([email protected])
80
+ - Jillaphat Jaroenkantasima ([email protected])