RAG Benchmark Evaluation System Architecture

High-Level Architecture Overview

The system follows a modular architecture with the following key components:

1. Data Layer

Dataset Loading (loaddataset.py)
- Handles RAGBench dataset loading from HuggingFace
- Processes multiple dataset configurations
- Extracts and normalizes data
Vector Database (Milvus)
- Stores document embeddings
- Enables efficient similarity search
- Manages metadata and scores

2. Processing Layer

Document Processing
- Chunking (insertmilvushelper.py)
- Sliding window implementation
- Overlap management
Embedding Generation
- SentenceTransformer models
- Vector representation creation
- Dimension reduction

3. Search & Retrieval Layer

Vector Search (searchmilvushelper.py)
- Cosine similarity computation
- Top-K retrieval
- Result ranking
Reranking System (finetuneresults.py)
- Multiple reranker options (MS MARCO, MonoT5)
- Context relevance scoring
- Result refinement

4. Generation Layer

LLM Integration (generationhelper.py)
- Multiple model support (LLaMA, Mistral)
- Context-aware response generation
- Prompt engineering

5. Evaluation Layer

Metrics Calculation (calculatescores.py)
- RMSE computation
- AUCROC calculation
- Context relevance/utilization scoring

6. Presentation Layer

Web Interface (app.py)
- Gradio-based UI
- Interactive model selection
- Real-time result display

Data Flow

User submits query through Gradio interface
Query is embedded and searched in Milvus
Retrieved documents are reranked
LLM generates response using context
Response is evaluated and scored
Results are displayed to user

Architecture Diagram

graph TB
    %% User Interface Layer
    UI[Web Interface - Gradio]

    %% Data Layer
    subgraph Data Layer
        DS[RAGBench Dataset]
        VDB[(Milvus Vector DB)]
    end

    %% Processing Layer
    subgraph Processing Layer
        DP[Document Processing]
        EG[Embedding Generation]
        style DP fill:#f9f,stroke:#333
        style EG fill:#f9f,stroke:#333
    end

    %% Search & Retrieval Layer
    subgraph Search & Retrieval
        VS[Vector Search]
        RR[Reranking System]
        style VS fill:#bbf,stroke:#333
        style RR fill:#bbf,stroke:#333
    end

    %% Generation Layer
    subgraph Generation Layer
        LLM[LLM Models]
        PR[Prompt Engineering]
        style LLM fill:#bfb,stroke:#333
        style PR fill:#bfb,stroke:#333
    end

    %% Evaluation Layer
    subgraph Evaluation Layer
        ME[Metrics Evaluation]
        SC[Score Calculation]
        style ME fill:#ffb,stroke:#333
        style SC fill:#ffb,stroke:#333
    end

    %% Flow Connections
    UI --> DP
    DS --> DP
    DP --> EG
    EG --> VDB
    UI --> VS
    VS --> VDB
    VS --> RR
    RR --> LLM
    LLM --> PR
    PR --> ME
    ME --> SC
    SC --> UI

    %% Model Components
    subgraph Models
        ST[SentenceTransformers]
        RM[Reranking Models]
        GM[Generation Models]
        style ST fill:#dfd,stroke:#333
        style RM fill:#dfd,stroke:#333
        style GM fill:#dfd,stroke:#333
    end

    %% Model Connections
    EG --> ST
    RR --> RM
    LLM --> GM

    %% Styling
    classDef default fill:#fff,stroke:#333,stroke-width:2px;
    classDef interface fill:#f96,stroke:#333,stroke-width:2px;
    class UI interface;