requests gradio beautifulsoup4 urllib3 trafilatura huggingface_hub sentence-transformers torch python-dotenv lxml lxml_html_clean tenacity scrapy newspaper3k PyPDF2 html2text duckduckgo_search groq faiss-cpu mistralai