alps / README.md
yumikimi381's picture
Upload folder using huggingface_hub
daf0288 verified

A newer version of the Gradio SDK is available: 5.12.0

Upgrade
metadata
title: alps
app_file: app.py
sdk: gradio
sdk_version: 4.44.0

Alps

Pipeline for OCRing PDFs and tables

This repository contains different OCR methods using various libraries/models.

Running gradio:

python app.py in terminal

Installation :

Build the docker image and run the contianer

Clone this repository and Install the required dependencies:

pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu117

apt install weasyprint

Note: You need a GPU to run this code.

Example Usage

Run python main.py inside the directory. Provide the path to the test file (the file must be placed inside the repository,and the file path should be relative to the repository (alps)). Next, provide the path to save intermediate outputs from the run (draw cell bounding boxes on the table, show table detection results in pdf), and specify which component to run.

outputs are printed in terminal

usage: main.py [-h] [--test_file TEST_FILE] [--debug_folder DEBUG_FOLDER] [--englishFlag ENGLISHFLAG] [--denoise DENOISE] ocr

Description of the component:

ocr1

ocr1 Input: Path to a PDF file Output: Dictionary of each page and list of line_annotations. List of LineAnnotations contains bboxes for each line and List of its children wordAnnotation. Each wordAnnotation contains bboxes and text inside. What it does: Runs Ragflow textline detector + OCR with DocTR

Example:

python main.py ocr1 --test_file TestingFiles/OCRTest1German.pdf --debug_folder ./res/ocrdebug1/ 
python main.py ocr1 --test_file TestingFiles/OCRTest3English.pdf --debug_folder ./res/ocrdebug1/ --englishFlag True

table1

Input : file path to an image of a cropped table Output: Parsed table in HTML form What it does: Uses Unitable + DocTR

python main.py table1 --test_file cropped_table.png --debug_folder ./res/table1/ 

table2

Input: File path to an image of a cropped table Output: Parsed table in HTML form What it does: Uses Unitable

python main.py table2 --test_file cropped_table.png  --debug_folder ./res/table2/ 

pdftable1

Input: PDF file path Output: Parsed table in HTML form What it does: Uses Unitable + DocTR

python main.py pdftable1 --test_file TestingFiles/OCRTest5English.pdf  --debug_folder ./res/table_debug1/ 

python main.py pdftable3 --test_file TestingFiles/TableOCRTestEnglish.pdf  --debug_folder ./res/poor_relief2

pdftable2 :

Input: PDF file path Output: Parsed table in HTML form What it does: Detects table and parses them, Runs Full Unitable Table detection

python main.py pdftable2 --test_file TestingFiles/OCRTest5English.pdf --debug_folder ./res/table_debug2/ 

pdftable3

Input: PDF file path Output: Parsed table in HTML form What it does: Detects table with YOLO, Unitable + DocTR

pdftable4

Input: PDF file path Output: Parsed table in HTML form What it does: Detects table with YOLO, Runs Full doctr Table detection

python main.py pdftable4 --test_file TestingFiles/TableOCRTestEasier.pdf --debug_folder ./res/table_debug3/

bbox

They are ordered as ordered as [xmin,ymin,xmax,ymax] . Cause the coordinates starts from (0,0) of the image which is upper left corner

xmin ymim - upper left corner xmax ymax - bottom lower corner

alt text