RagBenchCapstone10 / README.md
swaroop-uddandarao
added reports
408ab70

A newer version of the Gradio SDK is available: 5.24.0

Upgrade
metadata
title: RagBenchCapstone10
emoji: πŸ“‰
colorFrom: green
colorTo: yellow
sdk: gradio
sdk_version: 5.16.0
app_file: app.py
pinned: false
short_description: RagBench Dataset development by Saiteja

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

RAG Benchmark Evaluation System

Overview

This project implements a Retrieval-Augmented Generation (RAG) system for evaluating different language models and reranking strategies. It provides a user-friendly interface for querying documents and analyzing the performance of various models.

Features

  • Multiple LLM support (LLaMA 3.3, Mistral 7B)
  • Various reranking models:
    • MS MARCO MiniLM
    • MS MARCO TinyBERT
    • MonoT5 Base
    • MonoT5 Small
    • MonoT5 3B
  • Vector similarity search using Milvus
  • Automatic document chunking and retrieval
  • Performance metrics calculation
  • Interactive Gradio interface

Prerequisites

  • Python 3.8+
  • CUDA-compatible GPU (optional, for faster processing)

Installation

  1. Clone the repository: bash git clone https://github.com/yourusername/rag-benchmark.git cd rag-benchmark

  2. Install dependencies:

  • pip install -r requirements.txt
  1. Configure the models:
  • Create a models directory and add your language model files.

  • Create a rerankers directory and add your reranking model files.

  • Run the application:

  • python app.py

Usage

  1. Start the application:

  2. Access the web interface at http://localhost:7860

  3. Enter your question and select:

    • LLM Model (LLaMA 3.3 or Mistral 7B)
    • Reranking Model (MS MARCO or MonoT5 variants)
  4. Click "Evaluate Model" to get results

Metrics

The system calculates several performance metrics:

  • RMSE Context Relevance
  • RMSE Context Utilization
  • AUCROC Adherence
  • Processing Time

Reranking Models Comparison

MS MARCO Models

  • MiniLM: Fast and efficient, good general performance
  • TinyBERT: Lightweight, slightly lower accuracy but faster

MonoT5 Models

  • Small: Compact and fast, suitable for limited resources
  • Base: Balanced performance and speed
  • 3B: Highest accuracy, requires more computational resources

Error Handling

  • Automatic fallback to fewer documents if token limits are exceeded
  • Graceful handling of API timeouts
  • Comprehensive error logging

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Dependencies

  • gradio
  • torch
  • transformers
  • sentence-transformers
  • pymilvus
  • numpy
  • pandas
  • scikit-learn
  • tiktoken
  • groq
  • huggingface_hub

License

[Your License Here]

Acknowledgments

  • RAGBench dataset
  • Hugging Face Transformers
  • Milvus Vector Database
  • Groq API