ChatDOC commited on
Commit
8b2259d
·
verified ·
1 Parent(s): cd448da

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md CHANGED
@@ -22,6 +22,37 @@ This is a preview release of the OCRFlux-3B model that's fine tuned from Qwen2.5
22
  Quick links:
23
  - 🛠️ [Code](https://github.com/chatdoc-com/OCRFlux)
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ## Usage
26
 
27
  The best way to use this model is via the [OCRFlux toolkit](https://github.com/chatdoc-com/OCRFlux).
 
22
  Quick links:
23
  - 🛠️ [Code](https://github.com/chatdoc-com/OCRFlux)
24
 
25
+ OCRFlux is a multimodal large language model based toolkit for converting PDFs and images into clean, readable, plain Markdown text. It aims to push the current state-of-the-art to a significantly higher level.
26
+
27
+ Try the online demo: https://ocrflux.pdfparser.io/
28
+
29
+ # Functions
30
+
31
+ ## On each page
32
+
33
+ Convert into text with a natural reading order, even in the presence of multi-column layouts, figures, and insets
34
+ Support for complicated tables and equations
35
+ Automatically removes headers and footers
36
+
37
+ ## Cross-page table/paragraph merging
38
+
39
+ Cross-page table merging
40
+ Cross-page paragraph merging
41
+
42
+ ## Key features:
43
+ Superior parsing quality on each page
44
+
45
+ It respectively achieves 0.095 higher (from 0.872 to 0.967), 0.109 higher (from 0.858 to 0.967) and 0.187 higher (from 0.780 to 0.967) Edit Distance Similarity (EDS) on our released benchmark OCRFlux-bench-single than the baseline model olmOCR-7B-0225-preview, Nanonets-OCR-s and MonkeyOCR.
46
+
47
+ Native support for cross-page table/paragraph merging (to our best this is the first to support this feature in all the open sourced project).
48
+
49
+ Based on a 3B parameter VLM, so it can run even on GTX 3090 GPU.
50
+
51
+
52
+ ## News
53
+ Jun 17, 2025 - v0.1.0 - Initial public launch and demo.
54
+
55
+
56
  ## Usage
57
 
58
  The best way to use this model is via the [OCRFlux toolkit](https://github.com/chatdoc-com/OCRFlux).