docker_mineru / README.md
marcosremar2's picture
Update with magic-pdf API implementation
ab599b4
|
raw
history blame
1.6 kB
metadata
title: MinerU PDF Processor
emoji: πŸ“„
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860

MinerU PDF API

A simple API for extracting text and tables from PDF documents using MinerU's magic-pdf library.

Features

  • Extract text from PDF documents
  • Identify and extract tables from PDFs
  • Works with both regular and scanned PDFs
  • Simple JSON response format

API Endpoints

Health Check

GET /health

Returns the current status of the service.

Extract PDF Content

POST /extract

Upload a PDF file to extract its text and tables.

Request

  • file: The PDF file to process (multipart/form-data)

Response

JSON object containing:

  • filename: Original filename
  • pages: Array of pages with text and tables

Deployment

This application is deployed as a Hugging Face Space using Docker.

Local Development

To run this application locally:

  1. Install the requirements:

    pip install -r requirements.txt
    
  2. Run the application:

    python app.py
    
  3. Access the API at http://localhost:7860

Docker

You can also build and run with Docker:

docker build -t mineru-pdf-api .
docker run -p 7860:7860 mineru-pdf-api

About

This API is built on top of MinerU and magic-pdf, a powerful PDF extraction tool.

API Documentation

Once deployed, you can access the auto-generated Swagger documentation at:

https://marcosremar2-docker-mineru.hf.space/docs

For ReDoc documentation:

https://marcosremar2-docker-mineru.hf.space/redoc