metadata

title: F1-AI
emoji: 🏎️
colorFrom: red
colorTo: gray
sdk: streamlit
sdk_version: 1.27.2
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

F1-AI: Formula 1 RAG Application

F1-AI is a Retrieval-Augmented Generation (RAG) application specifically designed for Formula 1 information. It features an intelligent web scraper that automatically discovers and extracts Formula 1-related content from the web, stores it in a vector database, and enables natural language querying of the stored information.

Features

Web scraping of Formula 1 content with automatic content extraction
Vector database storage using Pinecone for efficient similarity search
OpenRouter integration with Mistral-7B-Instruct model for advanced LLM capabilities
HuggingFace embeddings for improved semantic understanding
RAG-powered question answering with contextual understanding and source citations
Command-line interface for automation and scripting
User-friendly Streamlit web interface with chat history
Asynchronous data ingestion and processing for improved performance

Architecture

F1-AI is built on a modern tech stack:

LangChain: Orchestrates the RAG pipeline and manages interactions between components
Pinecone: Vector database for storing and retrieving embeddings
OpenRouter: Primary LLM provider with Mistral-7B-Instruct model
HuggingFace: Provides all-MiniLM-L6-v2 embeddings model
Playwright: Handles web scraping with JavaScript support
BeautifulSoup4: Processes HTML content and extracts relevant information
Streamlit: Provides an interactive web interface with chat functionality

Prerequisites

Python 3.8 or higher
OpenRouter API key (set as OPENROUTER_API_KEY environment variable)
Pinecone API key (set as PINECONE_API_KEY environment variable)
8GB RAM minimum (16GB recommended)
Internet connection for web scraping

Installation

Clone the repository:
```
git clone <repository-url>
cd f1-ai
```
Install the required dependencies:
```
pip install -r requirements.txt
```
Install Playwright browsers:
```
playwright install chromium
```

Set up environment variables: Create a .env file with:

OPENROUTER_API_KEY=your_api_key_here    # Required for LLM functionality
PINECONE_API_KEY=your_api_key_here      # Required for vector storage

Usage

Command Line Interface

Scrape and ingest F1 content:
```
python f1_scraper.py --start-urls https://www.formula1.com/ --max-pages 100 --depth 2 --ingest
```
Options:
- --start-urls: Space-separated list of URLs to start crawling from
- --max-pages: Maximum number of pages to crawl (default: 100)
- --depth: Maximum crawl depth (default: 2)
- --ingest: Flag to ingest discovered content into RAG system
- --max-chunks: Maximum chunks per URL for ingestion (default: 50)
- --llm-provider: Choose LLM provider (openrouter)

Ask questions about Formula 1:

python f1_ai.py ask "Who won the 2023 F1 World Championship?"

Streamlit Interface

Run the Streamlit app:

streamlit run app.py

This will open a web interface where you can:

Ask questions about Formula 1
View responses in a chat-like interface
See source citations for answers
Track conversation history
Get real-time updates on response generation

Project Structure

f1_scraper.py: Intelligent web crawler implementation
- Automatically discovers F1-related content using keyword scoring
- Handles content relevance detection with priority paths
- Manages crawling depth and limits
- Implements domain-specific filtering
f1_ai.py: Core RAG application implementation
- Handles data ingestion and chunking
- Manages vector database operations
- Implements question-answering logic with source tracking
- Provides robust error handling
llm_manager.py: LLM provider management
- Integrates with OpenRouter for advanced LLM capabilities
- Manages HuggingFace embeddings generation
- Implements rate limiting and error recovery
- Handles async API interactions
app.py: Streamlit web interface
- Provides chat-based UI with message history
- Manages conversation state
- Handles async operations with progress tracking
- Implements error handling and user feedback

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Submit a Pull Request