Spaces:
Sleeping
Sleeping
metadata
title: MinerU PDF Processor
emoji: π
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860
MinerU PDF API
A simple API for extracting text and tables from PDF documents using MinerU's magic-pdf library.
Features
- Extract text from PDF documents
- Identify and extract tables from PDFs
- Works with both regular and scanned PDFs
- Simple JSON response format
API Endpoints
Health Check
GET /health
Returns the current status of the service.
Extract PDF Content
POST /extract
Upload a PDF file to extract its text and tables.
Request
file
: The PDF file to process (multipart/form-data)
Response
JSON object containing:
filename
: Original filenamepages
: Array of pages with text and tables
Deployment
This application is deployed as a Hugging Face Space using Docker.
Local Development
To run this application locally:
Install the requirements:
pip install -r requirements.txt
Run the application:
python app.py
Access the API at
http://localhost:7860
Docker
You can also build and run with Docker:
docker build -t mineru-pdf-api .
docker run -p 7860:7860 mineru-pdf-api
About
This API is built on top of MinerU and magic-pdf, a powerful PDF extraction tool.
API Documentation
Once deployed, you can access the auto-generated Swagger documentation at:
https://marcosremar2-docker-mineru.hf.space/docs
For ReDoc documentation:
https://marcosremar2-docker-mineru.hf.space/redoc