Spaces:

sikeaditya
/

f1-ai

Sleeping

f1-ai / README.md

AdityaAdaki

80a2c80 about 1 month ago

4.69 kB

	---
	title: F1-AI
	emoji: 🏎️
	colorFrom: red
	colorTo: gray
	sdk: streamlit
	sdk_version: "1.27.2"
	app_file: app.py
	pinned: false
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

	# F1-AI: Formula 1 RAG Application

	F1-AI is a Retrieval-Augmented Generation (RAG) application specifically designed for Formula 1 information. It features an intelligent web scraper that automatically discovers and extracts Formula 1-related content from the web, stores it in a vector database, and enables natural language querying of the stored information.

	## Features

	![Example](image.png)

	- Web scraping of Formula 1 content with automatic content extraction
	- Vector database storage using Pinecone for efficient similarity search
	- OpenRouter integration with Mistral-7B-Instruct model for advanced LLM capabilities
	- HuggingFace embeddings for improved semantic understanding
	- RAG-powered question answering with contextual understanding and source citations
	- Command-line interface for automation and scripting
	- User-friendly Streamlit web interface with chat history
	- Asynchronous data ingestion and processing for improved performance

	## Architecture

	F1-AI is built on a modern tech stack:

	- LangChain: Orchestrates the RAG pipeline and manages interactions between components
	- Pinecone: Vector database for storing and retrieving embeddings
	- OpenRouter: Primary LLM provider with Mistral-7B-Instruct model
	- HuggingFace: Provides all-MiniLM-L6-v2 embeddings model
	- Playwright: Handles web scraping with JavaScript support
	- BeautifulSoup4: Processes HTML content and extracts relevant information
	- Streamlit: Provides an interactive web interface with chat functionality

	## Prerequisites

	- Python 3.8 or higher
	- OpenRouter API key (set as OPENROUTER_API_KEY environment variable)
	- Pinecone API key (set as PINECONE_API_KEY environment variable)
	- 8GB RAM minimum (16GB recommended)
	- Internet connection for web scraping

	## Installation

	1. Clone the repository:
	```bash
	git clone <repository-url>
	cd f1-ai
	```

	2. Install the required dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Install Playwright browsers:
	```bash
	playwright install chromium
	```

	4. Set up environment variables:
	Create a .env file with:
	```
	OPENROUTER_API_KEY=your_api_key_here # Required for LLM functionality
	PINECONE_API_KEY=your_api_key_here # Required for vector storage
	```

	## Usage

	### Command Line Interface

	1. Scrape and ingest F1 content:
	```bash
	python f1_scraper.py --start-urls https://www.formula1.com/ --max-pages 100 --depth 2 --ingest
	```
	Options:
	- `--start-urls`: Space-separated list of URLs to start crawling from
	- `--max-pages`: Maximum number of pages to crawl (default: 100)
	- `--depth`: Maximum crawl depth (default: 2)
	- `--ingest`: Flag to ingest discovered content into RAG system
	- `--max-chunks`: Maximum chunks per URL for ingestion (default: 50)
	- `--llm-provider`: Choose LLM provider (openrouter)

	2. Ask questions about Formula 1:
	```bash
	python f1_ai.py ask "Who won the 2023 F1 World Championship?"
	```

	### Streamlit Interface

	Run the Streamlit app:
	```bash
	streamlit run app.py
	```

	This will open a web interface where you can:
	- Ask questions about Formula 1
	- View responses in a chat-like interface
	- See source citations for answers
	- Track conversation history
	- Get real-time updates on response generation

	## Project Structure

	- `f1_scraper.py`: Intelligent web crawler implementation
	- Automatically discovers F1-related content using keyword scoring
	- Handles content relevance detection with priority paths
	- Manages crawling depth and limits
	- Implements domain-specific filtering
	- `f1_ai.py`: Core RAG application implementation
	- Handles data ingestion and chunking
	- Manages vector database operations
	- Implements question-answering logic with source tracking
	- Provides robust error handling
	- `llm_manager.py`: LLM provider management
	- Integrates with OpenRouter for advanced LLM capabilities
	- Manages HuggingFace embeddings generation
	- Implements rate limiting and error recovery
	- Handles async API interactions
	- `app.py`: Streamlit web interface
	- Provides chat-based UI with message history
	- Manages conversation state
	- Handles async operations with progress tracking
	- Implements error handling and user feedback

	## Contributing

	Contributions are welcome! Please follow these steps:

	1. Fork the repository
	2. Create a feature branch
	3. Commit your changes
	4. Push to the branch
	5. Submit a Pull Request