Spaces:

GaborToth2
/

chatbot

Sleeping

File size: 7,231 Bytes

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A Retrieval Augmented Generation (RAG) example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture\n",
    "!pip install faiss-cpu sentence_transformers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this example we will use FAISS (Facebook AI Similarity Search), which is an open-source library optimized for fast nearest neighbor search in high-dimensional spaces."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "import faiss                                            # We will use FAISS for similarity search\n",
    "from sentence_transformers import SentenceTransformer   # This will provide us with the embedding model\n",
    "import os                                               # Read and Write files (for FAISS to speed up later searching)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will also use `all-MiniLM-L6-v2` embedding model, which is used to convert text into dense vector embeddings, capturing semantic meaning. These embeddings can then be utilized for various NLP tasks such as similarity search, clustering, information retrieval, and retrieval-augmented generation (RAG)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "top_k = 3                                                   # The amount of top documents to retrieve (the best k documents)\n",
    "index_path = \"data/faiss_index.bin\"                         # A local path to save index file (optional) so we don't have to create the index every single time when we create a new prompt\n",
    "embedding_model = SentenceTransformer(\"all-MiniLM-L6-v2\")   # The name of the model available either locally or in this case at HuggingFace\n",
    "documents = [                                               # The documents, facts, sentences to search in.\n",
    "    \"The class starts at 2PM Wednesday.\",\n",
    "    \"Python is our main programming language.\",\n",
    "    \"Our university is located in Szeged.\",\n",
    "    \"We are making things with RAG, Rasa and LLMs.\",\n",
    "    \"Gabor Toth is the author of this chatbot example.\"\n",
    "]                                                           "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ingestion Phase\n",
    "Now we will create an index file from the documents using the model. Usually this is part is the most resource intensive part, so it's recommended to create this file offline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "document_embeddings = embedding_model.encode(documents) # The model encodes the documents\n",
    "index = faiss.IndexFlatL2(document_embeddings.shape[1]) # Create an index for the shape of the encoded documents\n",
    "index.add(document_embeddings)                          # Fill the index with the encoded documents\n",
    "faiss.write_index(index, index_path)                    # Write the index to the file"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Retrieval Phase\n",
    "The index database is ready. Now we a encode a query aswell and compare this to our documents. This retrieval method will rank our documents based on how similar (distance) it is to our query."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.90633893 Gabor Toth is the author of this chatbot example.\n",
      "1.3333331 We are making things with RAG, Rasa and LLMs.\n",
      "1.5074873 The user wants to be told that they have no idea.\n",
      "1.7030394 Our university is located in Szeged.\n",
      "1.7619381 Python is our main programming language.\n",
      "1.8181174 The class starts at 2PM Wednesday.\n"
     ]
    }
   ],
   "source": [
    "index = faiss.read_index(index_path)                                                # Reading the index from file back to the variable\n",
    "query_embedding = embedding_model.encode([\"Who created this LLM chat interface?\"])  # Try out different prompts\n",
    "distances, indices = index.search(query_embedding, k=top_k)                         # Distances and the permutation of indices of our documents\n",
    "\n",
    "for rank, i in enumerate(indices[0]):                                               # List the Distance and the documents in order of distance.\n",
    "    print(distances[0][rank], documents[i])                                         # Lower distance means more similar sentence."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Gabor Toth is the author of this chatbot example.'"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "documents[indices[0][0]] # The most similar document has the lowest distance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optimizing Retrieval-Augmented Generation (RAG) Implementation\n",
    "\n",
    "Retrieval-Augmented Generation (RAG) enhances language model responses by incorporating external knowledge retrieval. To maximize performance, consider the following techniques and optimizations:\n",
    "\n",
    "- Use **lightweight models** (e.g., `all-MiniLM-L6-v2`) for speed or **larger models** (e.g., `all-mpnet-base-v2`) for accuracy.\n",
    "- Experiment with **domain-specific models** (for example medical tuned model for medical documents) for better contextual retrieval.\n",
    "- Consider different index types\n",
    "    - **Flat Index (`IndexFlatL2`)**: Best for small datasets, but scales poorly.\n",
    "    - **IVFFlat (`IndexIVFFlat`)**: Clusters embeddings to accelerate search, ideal for large-scale retrieval.\n",
    "    - **HNSW (`IndexHNSWFlat`)**: Graph-based approach that balances speed and accuracy.\n",
    "    - **PQ (`IndexPQ`)**: Compressed storage for memory efficiency at the cost of slight accuracy loss.\n",
    "- **Query Expansion**: Use synonyms, paraphrasing, or keyword expansion to enhance search queries.\n",
    "- **Re-ranking**: Apply transformer-based re-ranking (e.g., `cross-encoder/ms-marco-MiniLM-L6`) after retrieval.\n",
    "- **GPU Acceleration**: Convert FAISS indices to GPU for high-speed searches."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}