f1-ai / README.md
AdityaAdaki
a
80a2c80
---
title: F1-AI
emoji: ๐ŸŽ๏ธ
colorFrom: red
colorTo: gray
sdk: streamlit
sdk_version: "1.27.2"
app_file: app.py
pinned: false
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# F1-AI: Formula 1 RAG Application
F1-AI is a Retrieval-Augmented Generation (RAG) application specifically designed for Formula 1 information. It features an intelligent web scraper that automatically discovers and extracts Formula 1-related content from the web, stores it in a vector database, and enables natural language querying of the stored information.
## Features
![Example](image.png)
- Web scraping of Formula 1 content with automatic content extraction
- Vector database storage using Pinecone for efficient similarity search
- OpenRouter integration with Mistral-7B-Instruct model for advanced LLM capabilities
- HuggingFace embeddings for improved semantic understanding
- RAG-powered question answering with contextual understanding and source citations
- Command-line interface for automation and scripting
- User-friendly Streamlit web interface with chat history
- Asynchronous data ingestion and processing for improved performance
## Architecture
F1-AI is built on a modern tech stack:
- **LangChain**: Orchestrates the RAG pipeline and manages interactions between components
- **Pinecone**: Vector database for storing and retrieving embeddings
- **OpenRouter**: Primary LLM provider with Mistral-7B-Instruct model
- **HuggingFace**: Provides all-MiniLM-L6-v2 embeddings model
- **Playwright**: Handles web scraping with JavaScript support
- **BeautifulSoup4**: Processes HTML content and extracts relevant information
- **Streamlit**: Provides an interactive web interface with chat functionality
## Prerequisites
- Python 3.8 or higher
- OpenRouter API key (set as OPENROUTER_API_KEY environment variable)
- Pinecone API key (set as PINECONE_API_KEY environment variable)
- 8GB RAM minimum (16GB recommended)
- Internet connection for web scraping
## Installation
1. Clone the repository:
```bash
git clone <repository-url>
cd f1-ai
```
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
3. Install Playwright browsers:
```bash
playwright install chromium
```
4. Set up environment variables:
Create a .env file with:
```
OPENROUTER_API_KEY=your_api_key_here # Required for LLM functionality
PINECONE_API_KEY=your_api_key_here # Required for vector storage
```
## Usage
### Command Line Interface
1. Scrape and ingest F1 content:
```bash
python f1_scraper.py --start-urls https://www.formula1.com/ --max-pages 100 --depth 2 --ingest
```
Options:
- `--start-urls`: Space-separated list of URLs to start crawling from
- `--max-pages`: Maximum number of pages to crawl (default: 100)
- `--depth`: Maximum crawl depth (default: 2)
- `--ingest`: Flag to ingest discovered content into RAG system
- `--max-chunks`: Maximum chunks per URL for ingestion (default: 50)
- `--llm-provider`: Choose LLM provider (openrouter)
2. Ask questions about Formula 1:
```bash
python f1_ai.py ask "Who won the 2023 F1 World Championship?"
```
### Streamlit Interface
Run the Streamlit app:
```bash
streamlit run app.py
```
This will open a web interface where you can:
- Ask questions about Formula 1
- View responses in a chat-like interface
- See source citations for answers
- Track conversation history
- Get real-time updates on response generation
## Project Structure
- `f1_scraper.py`: Intelligent web crawler implementation
- Automatically discovers F1-related content using keyword scoring
- Handles content relevance detection with priority paths
- Manages crawling depth and limits
- Implements domain-specific filtering
- `f1_ai.py`: Core RAG application implementation
- Handles data ingestion and chunking
- Manages vector database operations
- Implements question-answering logic with source tracking
- Provides robust error handling
- `llm_manager.py`: LLM provider management
- Integrates with OpenRouter for advanced LLM capabilities
- Manages HuggingFace embeddings generation
- Implements rate limiting and error recovery
- Handles async API interactions
- `app.py`: Streamlit web interface
- Provides chat-based UI with message history
- Manages conversation state
- Handles async operations with progress tracking
- Implements error handling and user feedback
## Contributing
Contributions are welcome! Please follow these steps:
1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Submit a Pull Request