Spaces:

Moha782
/

gen-ai-project

Sleeping

Moha782 commited on Jun 26, 2024

Commit

e76edd3

verified ·

1 Parent(s): 51f29de

Create extract_text.py

Files changed (1) hide show

extract_text.py ADDED Viewed

+# extract_text.py
+import fitz  # PyMuPDF
+import json
+def extract_text_from_pdf(pdf_path):
+    doc = fitz.open(pdf_path)
+    text = []
+    for page in doc:
+        text.append(page.get_text())
+    return text
+if __name__ == "__main__":
+    pdf_text = extract_text_from_pdf("apexcustoms.pdf")
+    # Save the extracted text to a JSON file
+    with open("apexcustoms.json", "w") as f:
+        json.dump(pdf_text, f)