Update README.md
Browse files
README.md
CHANGED
@@ -52,7 +52,8 @@ print(generated_text)
|
|
52 |
|
53 |
## Model Performance Comparison
|
54 |
|
55 |
-
|
|
|
56 |
|
57 |
| Document Type | ThaiTrOCR | EasyOCR | Tesseract |
|
58 |
|:----------------------|---------:|--------:|---------:|
|
@@ -63,12 +64,12 @@ The table below summarizes the performance metrics of various models across diff
|
|
63 |
| Scene Text | **0.134182** | 0.390583 | 2.408704 |
|
64 |
| **Adjusted Mean** | **0.123600** | 0.298474 | 1.269101 |
|
65 |
|
66 |
-
|
67 |
|
68 |
-
|
69 |
-
|
70 |
-
-
|
71 |
-
|
72 |
|
73 |
## Sponsors
|
74 |
|
|
|
52 |
|
53 |
## Model Performance Comparison
|
54 |
|
55 |
+
This section details the performance comparison between the open-source ThaiTrOCR model and other widely-used OCR systems, namely EasyOCR and Tesseract. The table below highlights their respective performance across various document types based on the average Character Error Rate (CER).
|
56 |
+
|
57 |
|
58 |
| Document Type | ThaiTrOCR | EasyOCR | Tesseract |
|
59 |
|:----------------------|---------:|--------:|---------:|
|
|
|
64 |
| Scene Text | **0.134182** | 0.390583 | 2.408704 |
|
65 |
| **Adjusted Mean** | **0.123600** | 0.298474 | 1.269101 |
|
66 |
|
67 |
+
# Key Insights
|
68 |
|
69 |
+
* Character Error Rate (CER): This metric evaluates the percentage of characters that were incorrectly predicted by the model. A lower CER indicates better performance. As shown in the table, ThaiTrOCR consistently outperforms EasyOCR and Tesseract across all document types, with a significantly lower average CER, making it the most accurate model in the comparison.
|
70 |
+
* Model Performance: The ThaiTrOCR model is particularly effective with PDF documents (both Thai-only and bilingual English-Thai texts), and shows substantial improvement over competing models in reading scene text and handwritten content.
|
71 |
+
* Tesseract Limitation: It’s important to note that Tesseract only supports single-language input at a time in this comparison. For the purposes of this benchmark, it was tested using only the Thai language setting, which might have contributed to its higher CER values.
|
72 |
+
* The evaluation dataset is sourced from the [openthaigpt/thai-ocr-evaluation](https://huggingface.co/datasets/openthaigpt/thai-ocr-evaluation).
|
73 |
|
74 |
## Sponsors
|
75 |
|