Spaces:
Sleeping
Sleeping
File size: 7,231 Bytes
2636944 f7993f7 2636944 f7993f7 2636944 532fd13 2636944 532fd13 2636944 532fd13 2636944 f7993f7 532fd13 f7993f7 2636944 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# A Retrieval Augmented Generation (RAG) example"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"%%capture\n",
"!pip install faiss-cpu sentence_transformers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this example we will use FAISS (Facebook AI Similarity Search), which is an open-source library optimized for fast nearest neighbor search in high-dimensional spaces."
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"import faiss # We will use FAISS for similarity search\n",
"from sentence_transformers import SentenceTransformer # This will provide us with the embedding model\n",
"import os # Read and Write files (for FAISS to speed up later searching)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will also use `all-MiniLM-L6-v2` embedding model, which is used to convert text into dense vector embeddings, capturing semantic meaning. These embeddings can then be utilized for various NLP tasks such as similarity search, clustering, information retrieval, and retrieval-augmented generation (RAG)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"top_k = 3 # The amount of top documents to retrieve (the best k documents)\n",
"index_path = \"data/faiss_index.bin\" # A local path to save index file (optional) so we don't have to create the index every single time when we create a new prompt\n",
"embedding_model = SentenceTransformer(\"all-MiniLM-L6-v2\") # The name of the model available either locally or in this case at HuggingFace\n",
"documents = [ # The documents, facts, sentences to search in.\n",
" \"The class starts at 2PM Wednesday.\",\n",
" \"Python is our main programming language.\",\n",
" \"Our university is located in Szeged.\",\n",
" \"We are making things with RAG, Rasa and LLMs.\",\n",
" \"Gabor Toth is the author of this chatbot example.\"\n",
"] "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ingestion Phase\n",
"Now we will create an index file from the documents using the model. Usually this is part is the most resource intensive part, so it's recommended to create this file offline."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"document_embeddings = embedding_model.encode(documents) # The model encodes the documents\n",
"index = faiss.IndexFlatL2(document_embeddings.shape[1]) # Create an index for the shape of the encoded documents\n",
"index.add(document_embeddings) # Fill the index with the encoded documents\n",
"faiss.write_index(index, index_path) # Write the index to the file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Retrieval Phase\n",
"The index database is ready. Now we a encode a query aswell and compare this to our documents. This retrieval method will rank our documents based on how similar (distance) it is to our query."
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.90633893 Gabor Toth is the author of this chatbot example.\n",
"1.3333331 We are making things with RAG, Rasa and LLMs.\n",
"1.5074873 The user wants to be told that they have no idea.\n",
"1.7030394 Our university is located in Szeged.\n",
"1.7619381 Python is our main programming language.\n",
"1.8181174 The class starts at 2PM Wednesday.\n"
]
}
],
"source": [
"index = faiss.read_index(index_path) # Reading the index from file back to the variable\n",
"query_embedding = embedding_model.encode([\"Who created this LLM chat interface?\"]) # Try out different prompts\n",
"distances, indices = index.search(query_embedding, k=top_k) # Distances and the permutation of indices of our documents\n",
"\n",
"for rank, i in enumerate(indices[0]): # List the Distance and the documents in order of distance.\n",
" print(distances[0][rank], documents[i]) # Lower distance means more similar sentence."
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Gabor Toth is the author of this chatbot example.'"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"documents[indices[0][0]] # The most similar document has the lowest distance."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Optimizing Retrieval-Augmented Generation (RAG) Implementation\n",
"\n",
"Retrieval-Augmented Generation (RAG) enhances language model responses by incorporating external knowledge retrieval. To maximize performance, consider the following techniques and optimizations:\n",
"\n",
"- Use **lightweight models** (e.g., `all-MiniLM-L6-v2`) for speed or **larger models** (e.g., `all-mpnet-base-v2`) for accuracy.\n",
"- Experiment with **domain-specific models** (for example medical tuned model for medical documents) for better contextual retrieval.\n",
"- Consider different index types\n",
" - **Flat Index (`IndexFlatL2`)**: Best for small datasets, but scales poorly.\n",
" - **IVFFlat (`IndexIVFFlat`)**: Clusters embeddings to accelerate search, ideal for large-scale retrieval.\n",
" - **HNSW (`IndexHNSWFlat`)**: Graph-based approach that balances speed and accuracy.\n",
" - **PQ (`IndexPQ`)**: Compressed storage for memory efficiency at the cost of slight accuracy loss.\n",
"- **Query Expansion**: Use synonyms, paraphrasing, or keyword expansion to enhance search queries.\n",
"- **Re-ranking**: Apply transformer-based re-ranking (e.g., `cross-encoder/ms-marco-MiniLM-L6`) after retrieval.\n",
"- **GPU Acceleration**: Convert FAISS indices to GPU for high-speed searches."
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
|