Spaces:
Runtime error
Runtime error
title: camelot-pg | |
app_file: src/app/run.py | |
sdk: gradio | |
sdk_version: 4.32.2 | |
# PDF Table Parser | |
This script extracts tables from PDF files and saves them as CSV files. It supports command-line interface (CLI) for batch processing and also provides an optional web UI for interactive processing. | |
## Features | |
- Multi-page PDF support | |
- Progress display per lines/rows, per page, and per file | |
- CSV output with UTF-8 with BOM encoding | |
- Customizable edge and row tolerances for table detection | |
- Optional web UI for interactive processing using Gradio | |
## Installation | |
1. Clone the repository or download the script. | |
2. Install the required dependencies: | |
```bash | |
pip install rich camelot-py polars gradio gradio_pdf | |
``` | |
## Usage | |
### Command-Line Interface (CLI) | |
To run the script via CLI, use the following command: | |
```bash | |
python src/app/parser.py input1.pdf input2.pdf output1.csv output2.csv | |
``` | |
#### Arguments: | |
- `input_files`: List of input PDF files | |
- `output_files`: List of output CSV files (must match the number of input files) | |
#### Optional Arguments: | |
- `--delimiter`: Output file delimiter (default: `,`) | |
- `--edge_tol`: Tolerance parameter used to specify the distance between text and table edges (default: `50`) | |
- `--row_tol`: Tolerance parameter used to specify the distance between table rows (default: `10`) | |
- `--webui`: Launch the web UI | |
### Web UI | |
To run the script with the web UI, use the following command: | |
```bash | |
python src/app/run.py | |
``` | |
This will launch a Gradio-based web application where you can upload PDFs and view the extracted tables interactively. | |
## Example | |
### CLI Example | |
```bash | |
python src/app/parser.py data/demo.pdf data/output.csv --delimiter ";" --edge_tol 60 --row_tol 40 | |
``` | |
## License | |
This project is licensed under the MIT License. |