Spaces:
Sleeping
Sleeping
File size: 4,688 Bytes
ffeb80a 80a2c80 ffeb80a 4ac113f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
---
title: F1-AI
emoji: 🏎️
colorFrom: red
colorTo: gray
sdk: streamlit
sdk_version: "1.27.2"
app_file: app.py
pinned: false
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# F1-AI: Formula 1 RAG Application
F1-AI is a Retrieval-Augmented Generation (RAG) application specifically designed for Formula 1 information. It features an intelligent web scraper that automatically discovers and extracts Formula 1-related content from the web, stores it in a vector database, and enables natural language querying of the stored information.
## Features

- Web scraping of Formula 1 content with automatic content extraction
- Vector database storage using Pinecone for efficient similarity search
- OpenRouter integration with Mistral-7B-Instruct model for advanced LLM capabilities
- HuggingFace embeddings for improved semantic understanding
- RAG-powered question answering with contextual understanding and source citations
- Command-line interface for automation and scripting
- User-friendly Streamlit web interface with chat history
- Asynchronous data ingestion and processing for improved performance
## Architecture
F1-AI is built on a modern tech stack:
- **LangChain**: Orchestrates the RAG pipeline and manages interactions between components
- **Pinecone**: Vector database for storing and retrieving embeddings
- **OpenRouter**: Primary LLM provider with Mistral-7B-Instruct model
- **HuggingFace**: Provides all-MiniLM-L6-v2 embeddings model
- **Playwright**: Handles web scraping with JavaScript support
- **BeautifulSoup4**: Processes HTML content and extracts relevant information
- **Streamlit**: Provides an interactive web interface with chat functionality
## Prerequisites
- Python 3.8 or higher
- OpenRouter API key (set as OPENROUTER_API_KEY environment variable)
- Pinecone API key (set as PINECONE_API_KEY environment variable)
- 8GB RAM minimum (16GB recommended)
- Internet connection for web scraping
## Installation
1. Clone the repository:
```bash
git clone <repository-url>
cd f1-ai
```
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
3. Install Playwright browsers:
```bash
playwright install chromium
```
4. Set up environment variables:
Create a .env file with:
```
OPENROUTER_API_KEY=your_api_key_here # Required for LLM functionality
PINECONE_API_KEY=your_api_key_here # Required for vector storage
```
## Usage
### Command Line Interface
1. Scrape and ingest F1 content:
```bash
python f1_scraper.py --start-urls https://www.formula1.com/ --max-pages 100 --depth 2 --ingest
```
Options:
- `--start-urls`: Space-separated list of URLs to start crawling from
- `--max-pages`: Maximum number of pages to crawl (default: 100)
- `--depth`: Maximum crawl depth (default: 2)
- `--ingest`: Flag to ingest discovered content into RAG system
- `--max-chunks`: Maximum chunks per URL for ingestion (default: 50)
- `--llm-provider`: Choose LLM provider (openrouter)
2. Ask questions about Formula 1:
```bash
python f1_ai.py ask "Who won the 2023 F1 World Championship?"
```
### Streamlit Interface
Run the Streamlit app:
```bash
streamlit run app.py
```
This will open a web interface where you can:
- Ask questions about Formula 1
- View responses in a chat-like interface
- See source citations for answers
- Track conversation history
- Get real-time updates on response generation
## Project Structure
- `f1_scraper.py`: Intelligent web crawler implementation
- Automatically discovers F1-related content using keyword scoring
- Handles content relevance detection with priority paths
- Manages crawling depth and limits
- Implements domain-specific filtering
- `f1_ai.py`: Core RAG application implementation
- Handles data ingestion and chunking
- Manages vector database operations
- Implements question-answering logic with source tracking
- Provides robust error handling
- `llm_manager.py`: LLM provider management
- Integrates with OpenRouter for advanced LLM capabilities
- Manages HuggingFace embeddings generation
- Implements rate limiting and error recovery
- Handles async API interactions
- `app.py`: Streamlit web interface
- Provides chat-based UI with message history
- Manages conversation state
- Handles async operations with progress tracking
- Implements error handling and user feedback
## Contributing
Contributions are welcome! Please follow these steps:
1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Submit a Pull Request |