Spaces:
Sleeping
A newer version of the Streamlit SDK is available:
1.44.1
title: F1-AI
emoji: 🏎️
colorFrom: red
colorTo: gray
sdk: streamlit
sdk_version: 1.27.2
app_file: app.py
pinned: false
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
F1-AI: Formula 1 RAG Application
F1-AI is a Retrieval-Augmented Generation (RAG) application specifically designed for Formula 1 information. It features an intelligent web scraper that automatically discovers and extracts Formula 1-related content from the web, stores it in a vector database, and enables natural language querying of the stored information.
Features
- Web scraping of Formula 1 content with automatic content extraction
- Vector database storage using Pinecone for efficient similarity search
- OpenRouter integration with Mistral-7B-Instruct model for advanced LLM capabilities
- HuggingFace embeddings for improved semantic understanding
- RAG-powered question answering with contextual understanding and source citations
- Command-line interface for automation and scripting
- User-friendly Streamlit web interface with chat history
- Asynchronous data ingestion and processing for improved performance
Architecture
F1-AI is built on a modern tech stack:
- LangChain: Orchestrates the RAG pipeline and manages interactions between components
- Pinecone: Vector database for storing and retrieving embeddings
- OpenRouter: Primary LLM provider with Mistral-7B-Instruct model
- HuggingFace: Provides all-MiniLM-L6-v2 embeddings model
- Playwright: Handles web scraping with JavaScript support
- BeautifulSoup4: Processes HTML content and extracts relevant information
- Streamlit: Provides an interactive web interface with chat functionality
Prerequisites
- Python 3.8 or higher
- OpenRouter API key (set as OPENROUTER_API_KEY environment variable)
- Pinecone API key (set as PINECONE_API_KEY environment variable)
- 8GB RAM minimum (16GB recommended)
- Internet connection for web scraping
Installation
Clone the repository:
git clone <repository-url> cd f1-ai
Install the required dependencies:
pip install -r requirements.txt
Install Playwright browsers:
playwright install chromium
Set up environment variables: Create a .env file with:
OPENROUTER_API_KEY=your_api_key_here # Required for LLM functionality PINECONE_API_KEY=your_api_key_here # Required for vector storage
Usage
Command Line Interface
Scrape and ingest F1 content:
python f1_scraper.py --start-urls https://www.formula1.com/ --max-pages 100 --depth 2 --ingest
Options:
--start-urls
: Space-separated list of URLs to start crawling from--max-pages
: Maximum number of pages to crawl (default: 100)--depth
: Maximum crawl depth (default: 2)--ingest
: Flag to ingest discovered content into RAG system--max-chunks
: Maximum chunks per URL for ingestion (default: 50)--llm-provider
: Choose LLM provider (openrouter)
Ask questions about Formula 1:
python f1_ai.py ask "Who won the 2023 F1 World Championship?"
Streamlit Interface
Run the Streamlit app:
streamlit run app.py
This will open a web interface where you can:
- Ask questions about Formula 1
- View responses in a chat-like interface
- See source citations for answers
- Track conversation history
- Get real-time updates on response generation
Project Structure
f1_scraper.py
: Intelligent web crawler implementation- Automatically discovers F1-related content using keyword scoring
- Handles content relevance detection with priority paths
- Manages crawling depth and limits
- Implements domain-specific filtering
f1_ai.py
: Core RAG application implementation- Handles data ingestion and chunking
- Manages vector database operations
- Implements question-answering logic with source tracking
- Provides robust error handling
llm_manager.py
: LLM provider management- Integrates with OpenRouter for advanced LLM capabilities
- Manages HuggingFace embeddings generation
- Implements rate limiting and error recovery
- Handles async API interactions
app.py
: Streamlit web interface- Provides chat-based UI with message history
- Manages conversation state
- Handles async operations with progress tracking
- Implements error handling and user feedback
Contributing
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Submit a Pull Request