kobkrit commited on
Commit
1c1be54
·
verified ·
1 Parent(s): ca95332

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -52,7 +52,8 @@ print(generated_text)
52
 
53
  ## Model Performance Comparison
54
 
55
- The table below summarizes the performance metrics of various models across different document types, based on the adjusted mean score:
 
56
 
57
  | Document Type | ThaiTrOCR | EasyOCR | Tesseract |
58
  |:----------------------|---------:|--------:|---------:|
@@ -63,12 +64,12 @@ The table below summarizes the performance metrics of various models across diff
63
  | Scene Text | **0.134182** | 0.390583 | 2.408704 |
64
  | **Adjusted Mean** | **0.123600** | 0.298474 | 1.269101 |
65
 
66
- **Notes**
67
 
68
- - The CER metric indicates that lower scores reflect better performance.
69
- - Tesseract supports only one language at a time; this benchmark uses only Thai.
70
- - Benchmarking was performed on a Google Colab CPU task.
71
- - The evaluation dataset is sourced from the openthaigpt/thai-ocr-evaluation.
72
 
73
  ## Sponsors
74
 
 
52
 
53
  ## Model Performance Comparison
54
 
55
+ This section details the performance comparison between the open-source ThaiTrOCR model and other widely-used OCR systems, namely EasyOCR and Tesseract. The table below highlights their respective performance across various document types based on the average Character Error Rate (CER).
56
+
57
 
58
  | Document Type | ThaiTrOCR | EasyOCR | Tesseract |
59
  |:----------------------|---------:|--------:|---------:|
 
64
  | Scene Text | **0.134182** | 0.390583 | 2.408704 |
65
  | **Adjusted Mean** | **0.123600** | 0.298474 | 1.269101 |
66
 
67
+ # Key Insights
68
 
69
+ * Character Error Rate (CER): This metric evaluates the percentage of characters that were incorrectly predicted by the model. A lower CER indicates better performance. As shown in the table, ThaiTrOCR consistently outperforms EasyOCR and Tesseract across all document types, with a significantly lower average CER, making it the most accurate model in the comparison.
70
+ * Model Performance: The ThaiTrOCR model is particularly effective with PDF documents (both Thai-only and bilingual English-Thai texts), and shows substantial improvement over competing models in reading scene text and handwritten content.
71
+ * Tesseract Limitation: It’s important to note that Tesseract only supports single-language input at a time in this comparison. For the purposes of this benchmark, it was tested using only the Thai language setting, which might have contributed to its higher CER values.
72
+ * The evaluation dataset is sourced from the [openthaigpt/thai-ocr-evaluation](https://huggingface.co/datasets/openthaigpt/thai-ocr-evaluation).
73
 
74
  ## Sponsors
75