f1-ai / README.md
AdityaAdaki
a
80a2c80

A newer version of the Streamlit SDK is available: 1.44.1

Upgrade
metadata
title: F1-AI
emoji: 🏎️
colorFrom: red
colorTo: gray
sdk: streamlit
sdk_version: 1.27.2
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

F1-AI: Formula 1 RAG Application

F1-AI is a Retrieval-Augmented Generation (RAG) application specifically designed for Formula 1 information. It features an intelligent web scraper that automatically discovers and extracts Formula 1-related content from the web, stores it in a vector database, and enables natural language querying of the stored information.

Features

Example

  • Web scraping of Formula 1 content with automatic content extraction
  • Vector database storage using Pinecone for efficient similarity search
  • OpenRouter integration with Mistral-7B-Instruct model for advanced LLM capabilities
  • HuggingFace embeddings for improved semantic understanding
  • RAG-powered question answering with contextual understanding and source citations
  • Command-line interface for automation and scripting
  • User-friendly Streamlit web interface with chat history
  • Asynchronous data ingestion and processing for improved performance

Architecture

F1-AI is built on a modern tech stack:

  • LangChain: Orchestrates the RAG pipeline and manages interactions between components
  • Pinecone: Vector database for storing and retrieving embeddings
  • OpenRouter: Primary LLM provider with Mistral-7B-Instruct model
  • HuggingFace: Provides all-MiniLM-L6-v2 embeddings model
  • Playwright: Handles web scraping with JavaScript support
  • BeautifulSoup4: Processes HTML content and extracts relevant information
  • Streamlit: Provides an interactive web interface with chat functionality

Prerequisites

  • Python 3.8 or higher
  • OpenRouter API key (set as OPENROUTER_API_KEY environment variable)
  • Pinecone API key (set as PINECONE_API_KEY environment variable)
  • 8GB RAM minimum (16GB recommended)
  • Internet connection for web scraping

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd f1-ai
    
  2. Install the required dependencies:

    pip install -r requirements.txt
    
  3. Install Playwright browsers:

    playwright install chromium
    
  4. Set up environment variables: Create a .env file with:

    OPENROUTER_API_KEY=your_api_key_here    # Required for LLM functionality
    PINECONE_API_KEY=your_api_key_here      # Required for vector storage
    

Usage

Command Line Interface

  1. Scrape and ingest F1 content:

    python f1_scraper.py --start-urls https://www.formula1.com/ --max-pages 100 --depth 2 --ingest
    

    Options:

    • --start-urls: Space-separated list of URLs to start crawling from
    • --max-pages: Maximum number of pages to crawl (default: 100)
    • --depth: Maximum crawl depth (default: 2)
    • --ingest: Flag to ingest discovered content into RAG system
    • --max-chunks: Maximum chunks per URL for ingestion (default: 50)
    • --llm-provider: Choose LLM provider (openrouter)
  2. Ask questions about Formula 1:

    python f1_ai.py ask "Who won the 2023 F1 World Championship?"
    

Streamlit Interface

Run the Streamlit app:

streamlit run app.py

This will open a web interface where you can:

  • Ask questions about Formula 1
  • View responses in a chat-like interface
  • See source citations for answers
  • Track conversation history
  • Get real-time updates on response generation

Project Structure

  • f1_scraper.py: Intelligent web crawler implementation
    • Automatically discovers F1-related content using keyword scoring
    • Handles content relevance detection with priority paths
    • Manages crawling depth and limits
    • Implements domain-specific filtering
  • f1_ai.py: Core RAG application implementation
    • Handles data ingestion and chunking
    • Manages vector database operations
    • Implements question-answering logic with source tracking
    • Provides robust error handling
  • llm_manager.py: LLM provider management
    • Integrates with OpenRouter for advanced LLM capabilities
    • Manages HuggingFace embeddings generation
    • Implements rate limiting and error recovery
    • Handles async API interactions
  • app.py: Streamlit web interface
    • Provides chat-based UI with message history
    • Manages conversation state
    • Handles async operations with progress tracking
    • Implements error handling and user feedback

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Submit a Pull Request