metadata

title: RagBenchCapstone10
emoji: 📉
colorFrom: green
colorTo: yellow
sdk: gradio
sdk_version: 5.16.0
app_file: app.py
pinned: false
short_description: RagBench Dataset development by Saiteja

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

RAG Benchmark Evaluation System

Overview

This project implements a Retrieval-Augmented Generation (RAG) system for evaluating different language models and reranking strategies. It provides a user-friendly interface for querying documents and analyzing the performance of various models.

Features

Multiple LLM support (LLaMA 3.3, Mistral 7B)
Various reranking models:
- MS MARCO MiniLM
- MS MARCO TinyBERT
- MonoT5 Base
- MonoT5 Small
- MonoT5 3B
Vector similarity search using Milvus
Automatic document chunking and retrieval
Performance metrics calculation
Interactive Gradio interface

Prerequisites

Python 3.8+
CUDA-compatible GPU (optional, for faster processing)

Installation

Clone the repository: bash git clone https://github.com/yourusername/rag-benchmark.git cd rag-benchmark
Install dependencies:

pip install -r requirements.txt

Configure the models:

Create a models directory and add your language model files.
Create a rerankers directory and add your reranking model files.
Run the application:
python app.py

Usage

Start the application:
Access the web interface at http://localhost:7860
Enter your question and select:
- LLM Model (LLaMA 3.3 or Mistral 7B)
- Reranking Model (MS MARCO or MonoT5 variants)
Click "Evaluate Model" to get results

Metrics

The system calculates several performance metrics:

RMSE Context Relevance
RMSE Context Utilization
AUCROC Adherence
Processing Time

Reranking Models Comparison

MS MARCO Models

MiniLM: Fast and efficient, good general performance
TinyBERT: Lightweight, slightly lower accuracy but faster

MonoT5 Models

Small: Compact and fast, suitable for limited resources
Base: Balanced performance and speed
3B: Highest accuracy, requires more computational resources

Error Handling

Automatic fallback to fewer documents if token limits are exceeded
Graceful handling of API timeouts
Comprehensive error logging

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Dependencies

gradio
torch
transformers
sentence-transformers
pymilvus
numpy
pandas
scikit-learn
tiktoken
groq
huggingface_hub

License

[Your License Here]

Acknowledgments

RAGBench dataset
Hugging Face Transformers
Milvus Vector Database
Groq API