Spaces:
Sleeping
Sleeping
title: F1-AI | |
emoji: ๐๏ธ | |
colorFrom: red | |
colorTo: gray | |
sdk: streamlit | |
sdk_version: "1.27.2" | |
app_file: app.py | |
pinned: false | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
# F1-AI: Formula 1 RAG Application | |
F1-AI is a Retrieval-Augmented Generation (RAG) application specifically designed for Formula 1 information. It features an intelligent web scraper that automatically discovers and extracts Formula 1-related content from the web, stores it in a vector database, and enables natural language querying of the stored information. | |
## Features | |
 | |
- Web scraping of Formula 1 content with automatic content extraction | |
- Vector database storage using Pinecone for efficient similarity search | |
- OpenRouter integration with Mistral-7B-Instruct model for advanced LLM capabilities | |
- HuggingFace embeddings for improved semantic understanding | |
- RAG-powered question answering with contextual understanding and source citations | |
- Command-line interface for automation and scripting | |
- User-friendly Streamlit web interface with chat history | |
- Asynchronous data ingestion and processing for improved performance | |
## Architecture | |
F1-AI is built on a modern tech stack: | |
- **LangChain**: Orchestrates the RAG pipeline and manages interactions between components | |
- **Pinecone**: Vector database for storing and retrieving embeddings | |
- **OpenRouter**: Primary LLM provider with Mistral-7B-Instruct model | |
- **HuggingFace**: Provides all-MiniLM-L6-v2 embeddings model | |
- **Playwright**: Handles web scraping with JavaScript support | |
- **BeautifulSoup4**: Processes HTML content and extracts relevant information | |
- **Streamlit**: Provides an interactive web interface with chat functionality | |
## Prerequisites | |
- Python 3.8 or higher | |
- OpenRouter API key (set as OPENROUTER_API_KEY environment variable) | |
- Pinecone API key (set as PINECONE_API_KEY environment variable) | |
- 8GB RAM minimum (16GB recommended) | |
- Internet connection for web scraping | |
## Installation | |
1. Clone the repository: | |
```bash | |
git clone <repository-url> | |
cd f1-ai | |
``` | |
2. Install the required dependencies: | |
```bash | |
pip install -r requirements.txt | |
``` | |
3. Install Playwright browsers: | |
```bash | |
playwright install chromium | |
``` | |
4. Set up environment variables: | |
Create a .env file with: | |
``` | |
OPENROUTER_API_KEY=your_api_key_here # Required for LLM functionality | |
PINECONE_API_KEY=your_api_key_here # Required for vector storage | |
``` | |
## Usage | |
### Command Line Interface | |
1. Scrape and ingest F1 content: | |
```bash | |
python f1_scraper.py --start-urls https://www.formula1.com/ --max-pages 100 --depth 2 --ingest | |
``` | |
Options: | |
- `--start-urls`: Space-separated list of URLs to start crawling from | |
- `--max-pages`: Maximum number of pages to crawl (default: 100) | |
- `--depth`: Maximum crawl depth (default: 2) | |
- `--ingest`: Flag to ingest discovered content into RAG system | |
- `--max-chunks`: Maximum chunks per URL for ingestion (default: 50) | |
- `--llm-provider`: Choose LLM provider (openrouter) | |
2. Ask questions about Formula 1: | |
```bash | |
python f1_ai.py ask "Who won the 2023 F1 World Championship?" | |
``` | |
### Streamlit Interface | |
Run the Streamlit app: | |
```bash | |
streamlit run app.py | |
``` | |
This will open a web interface where you can: | |
- Ask questions about Formula 1 | |
- View responses in a chat-like interface | |
- See source citations for answers | |
- Track conversation history | |
- Get real-time updates on response generation | |
## Project Structure | |
- `f1_scraper.py`: Intelligent web crawler implementation | |
- Automatically discovers F1-related content using keyword scoring | |
- Handles content relevance detection with priority paths | |
- Manages crawling depth and limits | |
- Implements domain-specific filtering | |
- `f1_ai.py`: Core RAG application implementation | |
- Handles data ingestion and chunking | |
- Manages vector database operations | |
- Implements question-answering logic with source tracking | |
- Provides robust error handling | |
- `llm_manager.py`: LLM provider management | |
- Integrates with OpenRouter for advanced LLM capabilities | |
- Manages HuggingFace embeddings generation | |
- Implements rate limiting and error recovery | |
- Handles async API interactions | |
- `app.py`: Streamlit web interface | |
- Provides chat-based UI with message history | |
- Manages conversation state | |
- Handles async operations with progress tracking | |
- Implements error handling and user feedback | |
## Contributing | |
Contributions are welcome! Please follow these steps: | |
1. Fork the repository | |
2. Create a feature branch | |
3. Commit your changes | |
4. Push to the branch | |
5. Submit a Pull Request |