File size: 2,908 Bytes
408ab70
 
 
 
 
 
 
 
 
 
 
 
 
 
923b896
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
title: RagBenchCapstone10
emoji: πŸ“‰
colorFrom: green
colorTo: yellow
sdk: gradio
sdk_version: 5.16.0
app_file: app.py
pinned: false
short_description: RagBench Dataset development by Saiteja
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# RAG Benchmark Evaluation System

## Overview

This project implements a Retrieval-Augmented Generation (RAG) system for evaluating different language models and reranking strategies. It provides a user-friendly interface for querying documents and analyzing the performance of various models.

## Features

- Multiple LLM support (LLaMA 3.3, Mistral 7B)
- Various reranking models:
  - MS MARCO MiniLM
  - MS MARCO TinyBERT
  - MonoT5 Base
  - MonoT5 Small
  - MonoT5 3B
- Vector similarity search using Milvus
- Automatic document chunking and retrieval
- Performance metrics calculation
- Interactive Gradio interface

## Prerequisites

- Python 3.8+
- CUDA-compatible GPU (optional, for faster processing)

## Installation

1. Clone the repository:
   bash
   git clone https://github.com/yourusername/rag-benchmark.git
   cd rag-benchmark

2. Install dependencies:

- pip install -r requirements.txt

3. Configure the models:

- Create a `models` directory and add your language model files.
- Create a `rerankers` directory and add your reranking model files.

- Run the application:

- python app.py

## Usage

1. Start the application:

2. Access the web interface at `http://localhost:7860`

3. Enter your question and select:

   - LLM Model (LLaMA 3.3 or Mistral 7B)
   - Reranking Model (MS MARCO or MonoT5 variants)

4. Click "Evaluate Model" to get results

## Metrics

The system calculates several performance metrics:

- RMSE Context Relevance
- RMSE Context Utilization
- AUCROC Adherence
- Processing Time

## Reranking Models Comparison

### MS MARCO Models

- **MiniLM**: Fast and efficient, good general performance
- **TinyBERT**: Lightweight, slightly lower accuracy but faster

### MonoT5 Models

- **Small**: Compact and fast, suitable for limited resources
- **Base**: Balanced performance and speed
- **3B**: Highest accuracy, requires more computational resources

## Error Handling

- Automatic fallback to fewer documents if token limits are exceeded
- Graceful handling of API timeouts
- Comprehensive error logging

## Contributing

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

## Dependencies

- gradio
- torch
- transformers
- sentence-transformers
- pymilvus
- numpy
- pandas
- scikit-learn
- tiktoken
- groq
- huggingface_hub

## License

[Your License Here]

## Acknowledgments

- RAGBench dataset
- Hugging Face Transformers
- Milvus Vector Database
- Groq API