Spaces:

YchKhan
/

Ptt_Endpoints

Sleeping

YchKhan commited on Apr 4

Commit

01c9f58

verified ·

1 Parent(s): 00667d4

remove \n and replace them by one space in web extraction

Files changed (1) hide show

app.py CHANGED Viewed

@@ -182,7 +182,7 @@ def analyze_pdf_novelty(patent_background, pdf_url):
                 return {"error": "PDF has no pages"}
             first_page = pdf_document.load_page(0)
-            text = first_page.get_text()
             # Return the extracted text for frontend analysis with OpenAI
             # We're not doing the analysis here as it will be done in the frontend

                 return {"error": "PDF has no pages"}
             first_page = pdf_document.load_page(0)
+            text = re.sub(r'\n+', ' ', first_page.get_text())
             # Return the extracted text for frontend analysis with OpenAI
             # We're not doing the analysis here as it will be done in the frontend