SERPent

Running

Lucas ARRIESSE commited on about 7 hours ago

Commit

a9a935f

1 Parent(s): e14c7a4

Update project doc

Files changed (2) hide show

app.py CHANGED Viewed

@@ -36,7 +36,8 @@ async def api_lifespan(app: FastAPI):
     await pw_browser.close()
     await playwright.stop()
-app = FastAPI(lifespan=api_lifespan, docs_url="/")
 # Router for scrapping related endpoints
 scrap_router = APIRouter(prefix="/scrap", tags=["scrapping"])
@@ -138,7 +139,6 @@ async def search_duck(params: SerpQuery) -> SerpResults:
 @serp_router.post("/search")
-@app.post("/search")
 async def search(params: SerpQuery):
     """Attempts to search the specified queries using ALL backends"""
     results = []

     await pw_browser.close()
     await playwright.stop()
+app = FastAPI(lifespan=api_lifespan, docs_url="/",
+              title="SERPent", description=open("docs/docs.md").read())
 # Router for scrapping related endpoints
 scrap_router = APIRouter(prefix="/scrap", tags=["scrapping"])
 @serp_router.post("/search")
 async def search(params: SerpQuery):
     """Attempts to search the specified queries using ALL backends"""
     results = []

docs/docs.md ADDED Viewed

+# `SERPent`
+## SERP results scrapping
+SERPent exposes an unified API to query SERP (Search Engine Result Pages) for a few common search engines, namely:
+- DuckDuckGo
+- Brave
+- Bing
+- Google Patents
+- Google
+The application uses the `playwright` library to control a headless web browser, to simulate normal user activity, to fool the anti-bot measures often present on those sites. See the `/serp/` endpoints for search results scrapping.
+## Website sources scrapping
+SERPent also exposes a few endpoints to scrap the contents of certain sources (patents, scholar). See the `/scrap/` endpoints for supported website sources scrapping.