ChatDOC
/

OCRFlux-3B

@@ -26,18 +26,6 @@ OCRFlux is a multimodal large language model based toolkit for converting PDFs a
 Try the online demo: https://ocrflux.pdfparser.io/
-# Functions
-## On each page
-Convert into text with a natural reading order, even in the presence of multi-column layouts, figures, and insets
-Support for complicated tables and equations
-Automatically removes headers and footers
-## Cross-page table/paragraph merging
-Cross-page table merging
-Cross-page paragraph merging
 ## Key features:
 Superior parsing quality on each page
@@ -49,16 +37,100 @@ Native support for cross-page table/paragraph merging (to our best this is the f
 Based on a 3B parameter VLM, so it can run even on GTX 3090 GPU.
-## News
-Jun 17, 2025 - v0.1.0 - Initial public launch and demo.
 ## Usage
 The best way to use this model is via the [OCRFlux toolkit](https://github.com/chatdoc-com/OCRFlux).
 The toolkit comes with an efficient inference setup via vllm that can handle millions of documents
 at scale.
 ### Benchmark for single-page parsing
 We ship two comprehensive benchmarks to help measure the performance of our OCR system in single-page parsing:

 Try the online demo: https://ocrflux.pdfparser.io/
 ## Key features:
 Superior parsing quality on each page
 Based on a 3B parameter VLM, so it can run even on GTX 3090 GPU.
 ## Usage
 The best way to use this model is via the [OCRFlux toolkit](https://github.com/chatdoc-com/OCRFlux).
 The toolkit comes with an efficient inference setup via vllm that can handle millions of documents
 at scale.
+### API for directly calling OCRFlux (New)
+You can use the inference API to directly call OCRFlux in your codes without using an online vllm server like following:
+```
+from vllm import LLM
+from ocrflux.inference import parse
+file_path = 'test.pdf'
+# file_path = 'test.png'
+llm = LLM(model="model_dir/OCRFlux-3B",gpu_memory_utilization=0.8,max_model_len=8192)
+result = parse(llm,file_path)
+document_markdown = result['document_text']
+with open('test.md','w') as f:
+    f.write(document_markdown)
+```
+### Docker Usage
+Requirements:
+- Docker with GPU support [(NVIDIA Toolkit)](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
+- Pre-downloaded model: [OCRFlux-3B](https://huggingface.co/ChatDOC/OCRFlux-3B)
+To use OCRFlux in a docker container, you can use the following example command:
+```bash
+docker run -it --gpus all \
+  -v /path/to/localworkspace:/localworkspace \
+  -v /path/to/test_pdf_dir:/test_pdf_dir/ \
+  -v /path/to/OCRFlux-3B:/OCRFlux-3B \
+  chatdoc/ocrflux:latest /localworkspace --data /test_pdf_dir/* --model /OCRFlux-3B/
+```
+#### Viewing Results
+Generate the final Markdown files by running the following command. Generated Markdown files will be in `./localworkspace/markdowns/DOCUMENT_NAME` directory.
+```bash
+python -m ocrflux.jsonl_to_markdown ./localworkspace
+```
+### Full documentation for the pipeline
+```bash
+python -m ocrflux.pipeline --help
+usage: pipeline.py [-h] [--task {pdf2markdown,merge_pages,merge_tables}] [--data [DATA ...]] [--pages_per_group PAGES_PER_GROUP] [--max_page_retries MAX_PAGE_RETRIES]
+                   [--max_page_error_rate MAX_PAGE_ERROR_RATE] [--workers WORKERS] [--model MODEL] [--model_max_context MODEL_MAX_CONTEXT] [--model_chat_template MODEL_CHAT_TEMPLATE]
+                   [--target_longest_image_dim TARGET_LONGEST_IMAGE_DIM] [--skip_cross_page_merge] [--port PORT]
+                   workspace
+Manager for running millions of PDFs through a batch inference pipeline
+positional arguments:
+  workspace             The filesystem path where work will be stored, can be a local folder
+options:
+  -h, --help            show this help message and exit
+  --data [DATA ...]     List of paths to files to process
+  --pages_per_group PAGES_PER_GROUP
+                        Aiming for this many pdf pages per work item group
+  --max_page_retries MAX_PAGE_RETRIES
+                        Max number of times we will retry rendering a page
+  --max_page_error_rate MAX_PAGE_ERROR_RATE
+                        Rate of allowable failed pages in a document, 1/250 by default
+  --workers WORKERS     Number of workers to run at a time
+  --model MODEL         The path to the model
+  --model_max_context MODEL_MAX_CONTEXT
+                        Maximum context length that the model was fine tuned under
+  --model_chat_template MODEL_CHAT_TEMPLATE
+                        Chat template to pass to vllm server
+  --target_longest_image_dim TARGET_LONGEST_IMAGE_DIM
+                        Dimension on longest side to use for rendering the pdf pages
+  --skip_cross_page_merge
+                        Whether to skip cross-page merging
+  --port PORT           Port to use for the VLLM server
+```
+## Code overview
+There are some nice reusable pieces of the code that may be useful for your own projects:
+ - Processing millions of PDFs through our released model using VLLM - [pipeline.py](https://github.com/chatdoc-com/OCRFlux/blob/main/ocrflux/pipeline.py)
+ - Generating final Markdowns from jsonl files - [jsonl_to_markdown.py](https://github.com/chatdoc-com/OCRFlux/blob/main/ocrflux/jsonl_to_markdown.py)
+ - Evaluating the model on the single-page parsing task - [eval_page_to_markdown.py](https://github.com/chatdoc-com/OCRFlux/blob/main/eval/eval_page_to_markdown.py)
+ - Evaluating the model on the table parising task - [eval_table_to_html.py](https://github.com/chatdoc-com/OCRFlux/blob/main/eval/eval_table_to_html.py)
+ - Evaluating the model on the paragraphs/tables merging detection task - [eval_element_merge_detect.py](https://github.com/chatdoc-com/OCRFlux/blob/main/eval/eval_element_merge_detect.py)
+ - Evaluating the model on the table merging task - [eval_html_table_merge.py](https://github.com/chatdoc-com/OCRFlux/blob/main/eval/eval_html_table_merge.py)
 ### Benchmark for single-page parsing
 We ship two comprehensive benchmarks to help measure the performance of our OCR system in single-page parsing: