Spaces:

ultron1996
/

multimodal_rag

Runtime error

App Files Files Community

ej68okap commited on Jan 28

Commit

241c492

1 Parent(s): b9e672c

new code added

Browse files

Files changed (8) hide show

README.md +76 -9
app.py +133 -56
colpali_manager.py +76 -41
middleware.py +61 -20
milvus_manager.py +101 -44
pdf_manager.py +47 -10
rag.py +139 -55
utils.py +13 -2

README.md CHANGED Viewed

@@ -1,12 +1,79 @@
 ---
-title: Multimodal Rag
-emoji: 🐨
-colorFrom: indigo
-colorTo: blue
-sdk: gradio
-sdk_version: 5.12.0
-app_file: app.py
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Multimodal RAG with Colpali, Milvus, and Visual Language Models
+This repository demonstrates how to build a **Multimodal Retrieval-Augmented Generation (RAG)** application using **Colpali**, **Milvus**, and **Visual Language Models (VLMs)** like Gemini or GPT-4o. The application allows users to upload a PDF and perform Q&A queries on both textual and visual elements of the document.
+---
+## Features
+- **Multimodal Q&A**: Combines visual and textual embeddings for robust query answering.
+- **PDF as Images**: Treats PDF pages as images to preserve layout and visual context.
+- **Efficient Retrieval**: Utilizes Milvus for fast and accurate vector search.
+- **Advanced Query Processing**: Integrates Colpali and VLMs for embeddings and response generation.
 ---
+## Architecture Overview
+1. **Colpali**:
+   - Generates embeddings for images (PDF pages) and text (user queries).
+   - Processes visual and textual data seamlessly.
+2. **Milvus**:
+   - A vector database used for indexing and retrieving embeddings.
+   - Supports HNSW-based indexing for efficient similarity searches.
+3. **Visual Language Models**:
+   - Gemini or GPT-4o performs context-aware Q&A using retrieved pages.
 ---
+## Installation
+### Prerequisites
+- Python 3.8 or higher
+- CUDA-compatible GPU for acceleration
+- Milvus installed and running ([Installation Guide](https://milvus.io/docs/install_standalone.md))
+- Required Python packages (see `requirements.txt`)
+### Steps to Run the Application Locally
+1. Clone the repository
+2. Install dependencies as **pip install -r requirements.txt**
+3. Set up environment variables
+    Add the following variables to your .env file or environment:
+    GEMINI_API_KEY=<Your_Gemini_API_Key>
+4.  Launch the Gradio App as **python app.py**
+### Deploying the Gradio App on Hugging Face Spaces
+1. Prepare the Repository
+git clone https://github.com/saumitras/colpali-milvus-rag.git
+cd colpali-milvus-rag
+2. Organize the Repository:
+Ensure the app file (e.g., app.py) contains the Gradio application code.
+Include the requirements.txt file for dependencies.
+Update the Hugging Face API Configuration:
+3. Add necessary environment variables like GEMINI_API_KEY or OPENAI_API_KEY to the Hugging Face Spaces Secrets:
+Navigate to your Hugging Face Space.
+Go to the Settings tab and add the required secrets under Repository secrets.
+4. Create a New Space
+    Visit Hugging Face Spaces.
+    Click New Space.
+    Fill in the details:
+    Name: Give your Space a unique name (e.g., multimodal_rag).
+    SDK: Select Gradio as the SDK.
+    Visibility: Choose between Public or Private.
+    Click Create Space.
+5. Push Code to Hugging Face
+    Initialize Git and push the code:
+    git remote add hf https://huggingface.co/spaces/ultron1996/multimodal_rag
+    git push hf main
+6. Wait for the Hugging Face Space to build and deploy the application.
+The app has been deployed on Hugging Face Spaces and Demo is running at https://huggingface.co/spaces/ultron1996/multimodal_rag

app.py CHANGED Viewed

@@ -1,97 +1,171 @@
 import gradio as gr
 import tempfile
 import os
-import fitz  # PyMuPDF
 import uuid
 from middleware import Middleware
 from rag import Rag
-rag = Rag()
 def generate_uuid(state):
     # Check if UUID already exists in session state
     if state["user_uuid"] is None:
         # Generate a new UUID if not already set
         state["user_uuid"] = str(uuid.uuid4())
     return state["user_uuid"]
 class PDFSearchApp:
     def __init__(self):
-        self.indexed_docs = {}
-        self.current_pdf = None
     def upload_and_convert(self, state, file, max_pages):
-        id = generate_uuid(state)
-        if file is None:
             return "No file uploaded"
         print(f"Uploading file: {file.name}, id: {id}")
         try:
-            self.current_pdf = file.name
             middleware = Middleware(id, create_collection=True)
             pages = middleware.index(pdf_path=file.name, id=id, max_pages=max_pages)
             self.indexed_docs[id] = True
             return f"Uploaded and extracted {len(pages)} pages"
-        except Exception as e:
             return f"Error processing PDF: {str(e)}"
-    def search_documents(self, state, query, num_results=1):
         print(f"Searching for query: {query}")
-        id = generate_uuid(state)
-        if not self.indexed_docs[id]:
             print("Please index documents first")
-            return "Please index documents first", "--"
         if not query:
             print("Please enter a search query")
-            return "Please enter a search query", "--"
-        try:
             middleware = Middleware(id, create_collection=False)
-            search_results = middleware.search([query])[0]
-            page_num = search_results[0][1] + 1
-            print(f"Retrieved page number: {page_num}")
-            img_path = f"pages/{id}/page_{page_num}.png"
-            print(f"Retrieved image path: {img_path}")
-            rag_response = rag.get_answer_from_gemini(query, [img_path])
-            return img_path, rag_response
         except Exception as e:
-            return f"Error during search: {str(e)}", "--"
-def create_ui():
-    app = PDFSearchApp()
     with gr.Blocks() as demo:
-        state = gr.State(value={"user_uuid": None})
         gr.Markdown("# Colpali Milvus Multimodal RAG Demo")
-        gr.Markdown("This demo showcases how to use [Colpali](https://github.com/illuin-tech/colpali) embeddings with [Milvus](https://milvus.io/) and utilizing Gemini/OpenAI multimodal RAG for pdf search and Q&A.")
         with gr.Tab("Upload PDF"):
             with gr.Column():
                 file_input = gr.File(label="Upload PDF")
                 max_pages_input = gr.Slider(
                     minimum=1,
                     maximum=50,
@@ -99,38 +173,41 @@ def create_ui():
                     step=10,
                     label="Max pages to extract and index"
                 )
                 status = gr.Textbox(label="Indexing Status", interactive=False)
         with gr.Tab("Query"):
             with gr.Column():
                 query_input = gr.Textbox(label="Enter query")
-                # num_results = gr.Slider(
-                #     minimum=1,
-                #     maximum=10,
-                #     value=5,
-                #     step=1,
-                #     label="Number of results"
-                # )
                 search_btn = gr.Button("Query")
                 llm_answer = gr.Textbox(label="RAG Response", interactive=False)
                 images = gr.Image(label="Top page matching query")
-        # Event handlers
         file_input.change(
             fn=app.upload_and_convert,
             inputs=[state, file_input, max_pages_input],
             outputs=[status]
         )
         search_btn.click(
             fn=app.search_documents,
             inputs=[state, query_input],
             outputs=[images, llm_answer]
         )
-    return demo
 if __name__ == "__main__":
-    demo = create_ui()
-    demo.launch()

 import gradio as gr
 import tempfile
 import os
+import fitz  # PyMuPDF for working with PDF files
 import uuid
+# Importing middleware and RAG (Retrieval-Augmented Generation) components
 from middleware import Middleware
 from rag import Rag
+rag = Rag()  # Initializing RAG for question-answering functionality
+# Function to generate a unique UUID for each user session
 def generate_uuid(state):
     # Check if UUID already exists in session state
     if state["user_uuid"] is None:
         # Generate a new UUID if not already set
         state["user_uuid"] = str(uuid.uuid4())
     return state["user_uuid"]
 class PDFSearchApp:
+    """Class to manage PDF upload, indexing, and querying."""
     def __init__(self):
+        self.indexed_docs = {}  # Dictionary to track indexed documents by user ID
+        self.current_pdf = None  # Store the currently processed PDF
+    # Function to handle file uploads and convert PDFs into searchable data
     def upload_and_convert(self, state, file, max_pages):
+        id = generate_uuid(state)  # Get unique user ID
+        if file is None:  # Check if a file was uploaded
             return "No file uploaded"
         print(f"Uploading file: {file.name}, id: {id}")
         try:
+            self.current_pdf = file.name  # Store the name of the uploaded file
+            # Initialize Middleware for indexing the PDF content
             middleware = Middleware(id, create_collection=True)
+            # Index the specified number of pages from the PDF
             pages = middleware.index(pdf_path=file.name, id=id, max_pages=max_pages)
+            # Mark the document as indexed for this user
             self.indexed_docs[id] = True
             return f"Uploaded and extracted {len(pages)} pages"
+        except Exception as e:  # Handle errors during processing
             return f"Error processing PDF: {str(e)}"
+    def search_documents(self, state, query, num_results=3):  # Set num_results to return more pages
+        """
+        Search for a query within indexed PDF documents and return multiple matching pages.
+        Args:
+            state (dict): Session state containing user-specific data.
+            query (str): The user's search query.
+            num_results (int): Number of top results to return (default is 3).
+        Returns:
+            tuple: (list of image paths, RAG response) or an error message if no match is found.
+        """
         print(f"Searching for query: {query}")
+        id = generate_uuid(state)  # Get unique user ID
+        # Check if the document has been indexed
+        if not self.indexed_docs.get(id, False):
             print("Please index documents first")
+            return "Please index documents first", None
+        # Check if a query was provided
         if not query:
             print("Please enter a search query")
+            return "Please enter a search query", None
+        try:
+            # Initialize Middleware for searching
             middleware = Middleware(id, create_collection=False)
+            # Perform the search and retrieve the top results
+            search_results = middleware.search([query])  # Returns multiple matches
+            # Check if there are valid search results
+            if not search_results or not search_results[0]:
+                print("No relevant matches found in the PDF")
+                return "No relevant matches found in the PDF", None
+            # Extract multiple matching pages (up to num_results)
+            image_paths = []
+            for i in range(min(len(search_results[0]), num_results)):  # Limit to num_results
+                page_num = search_results[0][i][1] + 1  # Convert zero-based index to one-based
+                img_path = f"pages/{id}/page_{page_num}.png"
+                image_paths.append(img_path)
+            print(f"Retrieved image paths: {image_paths}")
+            # Get an answer from the RAG model using multiple images
+            rag_response = rag.get_answer_from_gemini(query, image_paths)
+            return image_paths, rag_response  # Return multiple image paths and RAG response
         except Exception as e:
+            # Handle and log any errors that occur
+            print(f"Error during search: {e}")
+            return f"Error during search: {str(e)}", None
+    # # Function to handle search queries within indexed PDFs
+    # def search_documents(self, state, query, num_results=1):
+    #     print(f"Searching for query: {query}")
+    #     id = generate_uuid(state)  # Get unique user ID
+    #     # Check if the document has been indexed
+    #     if not self.indexed_docs.get(id, False):
+    #         print("Please index documents first")
+    #         return "Please index documents first", "--"
+    #     # Check if a query was provided
+    #     if not query:
+    #         print("Please enter a search query")
+    #         return "Please enter a search query", "--"
+    #     try:
+    #         # Initialize Middleware for searching
+    #         middleware = Middleware(id, create_collection=False)
+    #         # Perform the search and retrieve the top result
+    #         search_results = middleware.search([query])[0]
+    #         # Extract the page number from the search results
+    #         page_num = search_results[0][1] + 1
+    #         print(f"Retrieved page number: {page_num}")
+    #         # Construct the image path for the retrieved page
+    #         img_path = f"pages/{id}/page_{page_num}.png"
+    #         print(f"Retrieved image path: {img_path}")
+    #         # Get an answer from the RAG model using the query and associated image
+    #         rag_response = rag.get_answer_from_gemini(query, [img_path])
+    #         return img_path, rag_response
+    #     except Exception as e:  # Handle errors during the search process
+    #         return f"Error during search: {str(e)}", "--"
+# Function to create the Gradio user interface
+def create_ui():
+    app = PDFSearchApp()  # Instantiate the PDFSearchApp class
     with gr.Blocks() as demo:
+        state = gr.State(value={"user_uuid": None})  # Initialize session state
+        # Header and introduction markdown
         gr.Markdown("# Colpali Milvus Multimodal RAG Demo")
+        gr.Markdown(
+            "This demo showcases how to use [Colpali](https://github.com/illuin-tech/colpali) embeddings with [Milvus](https://milvus.io/) and utilizing Gemini/OpenAI multimodal RAG for pdf search and Q&A."
+        )
+        # Upload PDF tab
         with gr.Tab("Upload PDF"):
             with gr.Column():
+                # Input for uploading files
                 file_input = gr.File(label="Upload PDF")
+                # Slider to select the maximum number of pages to index
                 max_pages_input = gr.Slider(
                     minimum=1,
                     maximum=50,
                     step=10,
                     label="Max pages to extract and index"
                 )
+                # Textbox to display indexing status
                 status = gr.Textbox(label="Indexing Status", interactive=False)
+        # Query tab for searching documents
         with gr.Tab("Query"):
             with gr.Column():
+                # Textbox for entering search queries
                 query_input = gr.Textbox(label="Enter query")
+                # Button to trigger the search
                 search_btn = gr.Button("Query")
+                # Textbox to display the response from RAG
                 llm_answer = gr.Textbox(label="RAG Response", interactive=False)
+                # Image display for the top-matching page
                 images = gr.Image(label="Top page matching query")
+        # Event handlers to connect UI components with backend functions
         file_input.change(
             fn=app.upload_and_convert,
             inputs=[state, file_input, max_pages_input],
             outputs=[status]
         )
         search_btn.click(
             fn=app.search_documents,
             inputs=[state, query_input],
             outputs=[images, llm_answer]
         )
+    return demo  # Return the constructed UI
+# Entry point to launch the application
 if __name__ == "__main__":
+    demo = create_ui()  # Create the Gradio interface
+    demo.launch()  # Launch the app

colpali_manager.py CHANGED Viewed

@@ -1,97 +1,132 @@
-from colpali_engine.models import ColPali
-from colpali_engine.models.paligemma.colpali.processing_colpali import ColPaliProcessor
-from colpali_engine.utils.processing_utils import BaseVisualRetrieverProcessor
-from colpali_engine.utils.torch_utils import ListDataset, get_torch_device
-from torch.utils.data import DataLoader
-import torch
-from typing import List, cast
-from tqdm import tqdm
-from PIL import Image
-import os
-import spaces
 model_name = "vidore/colpali-v1.2"
-device = get_torch_device("cuda")
 model = ColPali.from_pretrained(
     model_name,
-    torch_dtype=torch.bfloat16,
-    device_map=device,
-).eval()
 processor = cast(ColPaliProcessor, ColPaliProcessor.from_pretrained(model_name))
-class ColpaliManager:
-    def __init__(self, device = "cuda", model_name = "vidore/colpali-v1.2"):
         print(f"Initializing ColpaliManager with device {device} and model {model_name}")
         # self.device = get_torch_device(device)
         # self.model = ColPali.from_pretrained(
         #     model_name,
         #     torch_dtype=torch.bfloat16,
         #     device_map=self.device,
         # ).eval()
         # self.processor = cast(ColPaliProcessor, ColPaliProcessor.from_pretrained(model_name))
     @spaces.GPU
     def get_images(self, paths: list[str]) -> List[Image.Image]:
         return [Image.open(path) for path in paths]
     @spaces.GPU
-    def process_images(self, image_paths:list[str], batch_size=5):
         print(f"Processing {len(image_paths)} image_paths")
         images = self.get_images(image_paths)
         dataloader = DataLoader(
             dataset=ListDataset[str](images),
             batch_size=batch_size,
             shuffle=False,
-            collate_fn=lambda x: processor.process_images(x),
         )
-        ds: List[torch.Tensor] = []
-        for batch_doc in tqdm(dataloader):
-            with torch.no_grad():
                 batch_doc = {k: v.to(model.device) for k, v in batch_doc.items()}
                 embeddings_doc = model(**batch_doc)
-            ds.extend(list(torch.unbind(embeddings_doc.to(device))))
         ds_np = [d.float().cpu().numpy() for d in ds]
         return ds_np
     @spaces.GPU
     def process_text(self, texts: list[str]):
         print(f"Processing {len(texts)} texts")
         dataloader = DataLoader(
             dataset=ListDataset[str](texts),
-            batch_size=1,
             shuffle=False,
-            collate_fn=lambda x: processor.process_queries(x),
         )
-        qs: List[torch.Tensor] = []
-        for batch_query in dataloader:
-            with torch.no_grad():
                 batch_query = {k: v.to(model.device) for k, v in batch_query.items()}
                 embeddings_query = model(**batch_query)
-            qs.extend(list(torch.unbind(embeddings_query.to(device))))
         qs_np = [q.float().cpu().numpy() for q in qs]
         return qs_np

+# Importing required modules and libraries
+from colpali_engine.models import ColPali  # Main ColPali model for embeddings
+from colpali_engine.models.paligemma.colpali.processing_colpali import ColPaliProcessor  # Preprocessing for ColPali
+from colpali_engine.utils.processing_utils import BaseVisualRetrieverProcessor  # Base processor utility
+from colpali_engine.utils.torch_utils import ListDataset, get_torch_device  # Torch utilities for dataset and device management
+from torch.utils.data import DataLoader  # PyTorch DataLoader for batching
+import torch  # PyTorch library
+from typing import List, cast  # Type annotations
+from tqdm import tqdm  # Progress bar utility
+from PIL import Image  # Image processing library
+import os  # OS module for file path handling
+import spaces  # Custom decorator module for GPU management
+# Setting model name and initializing device
 model_name = "vidore/colpali-v1.2"
+device = get_torch_device("cuda")  # Get the available CUDA device
+# Load the ColPali model with the specified configuration
 model = ColPali.from_pretrained(
     model_name,
+    torch_dtype=torch.bfloat16,  # Use bfloat16 for reduced precision
+    device_map=device,  # Map the model to the selected device
+).eval()  # Set the model to evaluation mode
+# Initialize the processor for handling image and text inputs
 processor = cast(ColPaliProcessor, ColPaliProcessor.from_pretrained(model_name))
+class ColpaliManager:
+    """
+    A class to manage the processing of images and text using the ColPali model.
+    """
+    def __init__(self, device="cuda", model_name="vidore/colpali-v1.2"):
         print(f"Initializing ColpaliManager with device {device} and model {model_name}")
+        # Uncomment the below lines if the class should initialize its own model and processor
         # self.device = get_torch_device(device)
         # self.model = ColPali.from_pretrained(
         #     model_name,
         #     torch_dtype=torch.bfloat16,
         #     device_map=self.device,
         # ).eval()
         # self.processor = cast(ColPaliProcessor, ColPaliProcessor.from_pretrained(model_name))
     @spaces.GPU
     def get_images(self, paths: list[str]) -> List[Image.Image]:
+        """
+        Load images from the given file paths.
+        Args:
+            paths (list[str]): List of file paths to images.
+        Returns:
+            List[Image.Image]: List of loaded PIL Image objects.
+        """
         return [Image.open(path) for path in paths]
     @spaces.GPU
+    def process_images(self, image_paths: list[str], batch_size=5):
+        """
+        Process a list of image paths to generate embeddings.
+        Args:
+            image_paths (list[str]): List of image file paths.
+            batch_size (int): Batch size for processing images.
+        Returns:
+            list: List of image embeddings as NumPy arrays.
+        """
         print(f"Processing {len(image_paths)} image_paths")
+        # Load images
         images = self.get_images(image_paths)
+        # Create a DataLoader for batching the images
         dataloader = DataLoader(
             dataset=ListDataset[str](images),
             batch_size=batch_size,
             shuffle=False,
+            collate_fn=lambda x: processor.process_images(x),  # Process images using the processor
         )
+        ds: List[torch.Tensor] = []  # Initialize a list to store embeddings
+        for batch_doc in tqdm(dataloader):  # Iterate through batches with a progress bar
+            with torch.no_grad():  # Disable gradient calculations for inference
+                # Move batch to the model's device
                 batch_doc = {k: v.to(model.device) for k, v in batch_doc.items()}
+                # Generate embeddings
                 embeddings_doc = model(**batch_doc)
+            ds.extend(list(torch.unbind(embeddings_doc.to(device))))  # Append each embedding to the list
+        # Convert embeddings to NumPy arrays
         ds_np = [d.float().cpu().numpy() for d in ds]
         return ds_np
     @spaces.GPU
     def process_text(self, texts: list[str]):
+        """
+        Process a list of text inputs to generate embeddings.
+        Args:
+            texts (list[str]): List of text inputs.
+        Returns:
+            list: List of text embeddings as NumPy arrays.
+        """
         print(f"Processing {len(texts)} texts")
+        # Create a DataLoader for batching the texts
         dataloader = DataLoader(
             dataset=ListDataset[str](texts),
+            batch_size=1,  # Process texts one at a time
             shuffle=False,
+            collate_fn=lambda x: processor.process_queries(x),  # Process texts using the processor
         )
+        qs: List[torch.Tensor] = []  # Initialize a list to store text embeddings
+        for batch_query in dataloader:  # Iterate through batches
+            with torch.no_grad():  # Disable gradient calculations for inference
+                # Move batch to the model's device
                 batch_query = {k: v.to(model.device) for k, v in batch_query.items()}
+                # Generate embeddings
                 embeddings_query = model(**batch_query)
+            qs.extend(list(torch.unbind(embeddings_query.to(device))))  # Append each embedding to the list
+        # Convert embeddings to NumPy arrays
         qs_np = [q.float().cpu().numpy() for q in qs]
         return qs_np

middleware.py CHANGED Viewed

@@ -1,56 +1,97 @@
-from colpali_manager import ColpaliManager
-from milvus_manager import MilvusManager
-from pdf_manager import PdfManager
-import hashlib
-pdf_manager = PdfManager()
-colpali_manager = ColpaliManager()
 class Middleware:
-    def __init__(self, id:str, create_collection=True):
         hashed_id = hashlib.md5(id.encode()).hexdigest()[:8]
         milvus_db_name = f"milvus_{hashed_id}.db"
         self.milvus_manager = MilvusManager(milvus_db_name, "colpali", create_collection)
-    def index(self, pdf_path: str, id:str, max_pages: int, pages: list[int] = None):
         print(f"Indexing {pdf_path}, id: {id}, max_pages: {max_pages}")
         image_paths = pdf_manager.save_images(id, pdf_path, max_pages)
         print(f"Saved {len(image_paths)} images")
         colbert_vecs = colpali_manager.process_images(image_paths)
         images_data = [{
-            "colbert_vecs": colbert_vecs[i],
-            "filepath": image_paths[i]
         } for i in range(len(image_paths))]
         print(f"Inserting {len(images_data)} images data to Milvus")
         self.milvus_manager.insert_images_data(images_data)
         print("Indexing completed")
-        return image_paths
     def search(self, search_queries: list[str]):
         print(f"Searching for {len(search_queries)} queries")
-        final_res = []
         for query in search_queries:
             print(f"Searching for query: {query}")
             query_vec = colpali_manager.process_text([query])[0]
             search_res = self.milvus_manager.search(query_vec, topk=1)
             print(f"Search result: {search_res} for query: {query}")
-            final_res.append(search_res)
-        return final_res

+# Import necessary modules and classes
+from colpali_manager import ColpaliManager  # Manages processing of images and text with the ColPali model
+from milvus_manager import MilvusManager  # Manages interactions with the Milvus database
+from pdf_manager import PdfManager  # Handles PDF processing tasks
+import hashlib  # Library for creating hashed identifiers
+# Initialize managers
+pdf_manager = PdfManager()  # PDF manager instance for handling PDF-related operations
+colpali_manager = ColpaliManager()  # ColPali manager instance for processing images and text
 class Middleware:
+    """
+    Middleware class that integrates PDF processing, image embedding, and database indexing/searching.
+    """
+    def __init__(self, id: str, create_collection=True):
+        """
+        Initialize the Middleware with a unique identifier and Milvus database setup.
+        Args:
+            id (str): Unique identifier for the user/session.
+            create_collection (bool): Whether to create a new collection in the Milvus database.
+        """
+        # Generate a hashed ID for the Milvus database name
         hashed_id = hashlib.md5(id.encode()).hexdigest()[:8]
         milvus_db_name = f"milvus_{hashed_id}.db"
+        # Initialize the Milvus manager with the generated database name
         self.milvus_manager = MilvusManager(milvus_db_name, "colpali", create_collection)
+    def index(self, pdf_path: str, id: str, max_pages: int, pages: list[int] = None):
+        """
+        Index the content of a PDF file into the Milvus database.
+        Args:
+            pdf_path (str): Path to the PDF file.
+            id (str): Unique identifier for the session.
+            max_pages (int): Maximum number of pages to extract and index.
+            pages (list[int], optional): Specific pages to extract (default is None for all).
+        Returns:
+            list[str]: List of paths to the saved image files.
+        """
         print(f"Indexing {pdf_path}, id: {id}, max_pages: {max_pages}")
+        # Convert PDF pages into image files and save them
         image_paths = pdf_manager.save_images(id, pdf_path, max_pages)
         print(f"Saved {len(image_paths)} images")
+        # Generate image embeddings using the ColPali model
         colbert_vecs = colpali_manager.process_images(image_paths)
+        # Prepare data for insertion into Milvus
         images_data = [{
+            "colbert_vecs": colbert_vecs[i],  # Image embeddings
+            "filepath": image_paths[i]       # Corresponding image file path
         } for i in range(len(image_paths))]
         print(f"Inserting {len(images_data)} images data to Milvus")
+        # Insert the image data into the Milvus database
         self.milvus_manager.insert_images_data(images_data)
         print("Indexing completed")
+        return image_paths  # Return the list of saved image paths
     def search(self, search_queries: list[str]):
+        """
+        Search for matching results in the indexed database based on text queries.
+        Args:
+            search_queries (list[str]): List of search queries.
+        Returns:
+            list: Search results for each query.
+        """
         print(f"Searching for {len(search_queries)} queries")
+        final_res = []  # List to store the final search results
         for query in search_queries:
             print(f"Searching for query: {query}")
+            # Process the query text to generate an embedding
             query_vec = colpali_manager.process_text([query])[0]
+            # Perform the search in the Milvus database
             search_res = self.milvus_manager.search(query_vec, topk=1)
             print(f"Search result: {search_res} for query: {query}")
+            # Append the search results to the final results list
+            final_res.append(search_res)
+        return final_res  # Return all search results

milvus_manager.py CHANGED Viewed

@@ -1,69 +1,99 @@
-from pymilvus import MilvusClient, DataType
-import numpy as np
-import concurrent.futures
 class MilvusManager:
     def __init__(self, milvus_uri, collection_name, create_collection, dim=128):
-        self.client = MilvusClient(uri=milvus_uri)
         self.collection_name = collection_name
         if self.client.has_collection(collection_name=self.collection_name):
             self.client.load_collection(collection_name)
-        self.dim = dim
         if create_collection:
-            self.create_collection()
-            self.create_index()
     def create_collection(self):
         if self.client.has_collection(collection_name=self.collection_name):
             self.client.drop_collection(collection_name=self.collection_name)
         schema = self.client.create_schema(
-            auto_id=True,
-            enable_dynamic_fields=True,
         )
-        schema.add_field(field_name="pk", datatype=DataType.INT64, is_primary=True)
         schema.add_field(
-            field_name="vector", datatype=DataType.FLOAT_VECTOR, dim=self.dim
         )
-        schema.add_field(field_name="seq_id", datatype=DataType.INT16)
-        schema.add_field(field_name="doc_id", datatype=DataType.INT64)
-        schema.add_field(field_name="doc", datatype=DataType.VARCHAR, max_length=65535)
         self.client.create_collection(
             collection_name=self.collection_name, schema=schema
         )
     def create_index(self):
         self.client.release_collection(collection_name=self.collection_name)
-        self.client.drop_index(
-            collection_name=self.collection_name, index_name="vector"
-        )
         index_params = self.client.prepare_index_params()
         index_params.add_index(
             field_name="vector",
             index_name="vector_index",
-            index_type="HNSW",
-            metric_type="IP",
             params={
-                "M": 16,
-                "efConstruction": 500,
             },
         )
         self.client.create_index(
             collection_name=self.collection_name, index_params=index_params, sync=True
         )
     def create_scalar_index(self):
         self.client.release_collection(collection_name=self.collection_name)
         index_params = self.client.prepare_index_params()
         index_params.add_index(
             field_name="doc_id",
             index_name="int32_index",
-            index_type="INVERTED",
         )
         self.client.create_index(
@@ -71,14 +101,26 @@ class MilvusManager:
         )
     def search(self, data, topk):
-        search_params = {"metric_type": "IP", "params": {}}
         results = self.client.search(
             self.collection_name,
             data,
-            limit=int(50),
-            output_fields=["vector", "seq_id", "doc_id"],
             search_params=search_params,
         )
         doc_ids = set()
         for r_id in range(len(results)):
             for r in range(len(results[r_id])):
@@ -86,19 +128,22 @@ class MilvusManager:
         scores = []
         def rerank_single_doc(doc_id, data, client, collection_name):
             doc_colbert_vecs = client.query(
                 collection_name=collection_name,
-                filter=f"doc_id in [{doc_id}, {doc_id + 1}]",
-                output_fields=["seq_id", "vector", "doc"],
-                limit=1000,
             )
             doc_vecs = np.vstack(
                 [doc_colbert_vecs[i]["vector"] for i in range(len(doc_colbert_vecs))]
             )
             score = np.dot(data, doc_vecs.T).max(1).sum()
             return (score, doc_id)
         with concurrent.futures.ThreadPoolExecutor(max_workers=300) as executor:
             futures = {
                 executor.submit(
@@ -110,20 +155,25 @@ class MilvusManager:
                 score, doc_id = future.result()
                 scores.append((score, doc_id))
         scores.sort(key=lambda x: x[0], reverse=True)
-        if len(scores) >= topk:
-            return scores[:topk]
-        else:
-            return scores
     def insert(self, data):
         colbert_vecs = [vec for vec in data["colbert_vecs"]]
         seq_length = len(colbert_vecs)
         doc_ids = [data["doc_id"] for i in range(seq_length)]
         seq_ids = list(range(seq_length))
         docs = [""] * seq_length
-        docs[0] = data["filepath"]
         self.client.insert(
             self.collection_name,
             [
@@ -137,11 +187,17 @@ class MilvusManager:
             ],
         )
-    def get_images_as_doc(self, images_with_vectors:list):
-        images_data = []
         for i in range(len(images_with_vectors)):
             data = {
                 "colbert_vecs": images_with_vectors[i]["colbert_vecs"],
@@ -149,14 +205,15 @@ class MilvusManager:
                 "filepath": images_with_vectors[i]["filepath"],
             }
             images_data.append(data)
         return images_data
     def insert_images_data(self, image_data):
-        data = self.get_images_as_doc(image_data)
         for i in range(len(data)):
-            self.insert(data[i])

+# Import necessary modules
+from pymilvus import MilvusClient, DataType  # Milvus client and data type definitions
+import numpy as np  # For numerical operations
+import concurrent.futures  # For concurrent execution of tasks
 class MilvusManager:
+    """
+    A manager class for interacting with the Milvus database, handling collection creation,
+    data insertion, and search functionality.
+    """
     def __init__(self, milvus_uri, collection_name, create_collection, dim=128):
+        """
+        Initialize the MilvusManager.
+        Args:
+            milvus_uri (str): URI for connecting to the Milvus server.
+            collection_name (str): Name of the collection in Milvus.
+            create_collection (bool): Whether to create a new collection.
+            dim (int): Dimensionality of the vector embeddings (default is 128).
+        """
+        self.client = MilvusClient(uri=milvus_uri)  # Initialize the Milvus client
         self.collection_name = collection_name
+        self.dim = dim
+        # Load the collection if it exists, otherwise create it
         if self.client.has_collection(collection_name=self.collection_name):
             self.client.load_collection(collection_name)
         if create_collection:
+            self.create_collection()  # Create a new collection
+            self.create_index()       # Create an index for the collection
     def create_collection(self):
+        """
+        Create a new collection in Milvus with a predefined schema.
+        """
+        # Drop the collection if it already exists
         if self.client.has_collection(collection_name=self.collection_name):
             self.client.drop_collection(collection_name=self.collection_name)
+        # Define the schema for the collection
         schema = self.client.create_schema(
+            auto_id=True,  # Enable automatic ID assignment
+            enable_dynamic_fields=True,  # Allow dynamic fields
         )
+        schema.add_field(field_name="pk", datatype=DataType.INT64, is_primary=True)  # Primary key
         schema.add_field(
+            field_name="vector", datatype=DataType.FLOAT_VECTOR, dim=self.dim  # Vector field
         )
+        schema.add_field(field_name="seq_id", datatype=DataType.INT16)  # Sequence ID
+        schema.add_field(field_name="doc_id", datatype=DataType.INT64)  # Document ID
+        schema.add_field(field_name="doc", datatype=DataType.VARCHAR, max_length=65535)  # Document path
+        # Create the collection with the specified schema
         self.client.create_collection(
             collection_name=self.collection_name, schema=schema
         )
     def create_index(self):
+        """
+        Create an HNSW index for the vector field in the collection.
+        """
+        # Release the collection before updating the index
         self.client.release_collection(collection_name=self.collection_name)
+        self.client.drop_index(collection_name=self.collection_name, index_name="vector")
+        # Define the HNSW index parameters
         index_params = self.client.prepare_index_params()
         index_params.add_index(
             field_name="vector",
             index_name="vector_index",
+            index_type="HNSW",  # Hierarchical Navigable Small World graph index
+            metric_type="IP",  # Inner Product (dot product) as similarity metric
             params={
+                "M": 16,              # Number of candidate connections
+                "efConstruction": 500,  # Construction complexity
             },
         )
+        # Create the index and synchronize with the server
         self.client.create_index(
             collection_name=self.collection_name, index_params=index_params, sync=True
         )
     def create_scalar_index(self):
+        """
+        Create an inverted index for scalar fields such as document IDs.
+        """
         self.client.release_collection(collection_name=self.collection_name)
         index_params = self.client.prepare_index_params()
         index_params.add_index(
             field_name="doc_id",
             index_name="int32_index",
+            index_type="INVERTED",  # Inverted index for scalar data
         )
         self.client.create_index(
         )
     def search(self, data, topk):
+        """
+        Search for the top-k most similar vectors in the collection.
+        Args:
+            data (array-like): Query vector.
+            topk (int): Number of top results to return.
+        Returns:
+            list: Sorted list of top-k results.
+        """
+        search_params = {"metric_type": "IP", "params": {}}  # Search parameters for Inner Product
         results = self.client.search(
             self.collection_name,
             data,
+            limit=50,  # Initial retrieval limit
+            output_fields=["vector", "seq_id", "doc_id"],  # Fields to include in the output
             search_params=search_params,
         )
+        # Collect unique document IDs from the search results
         doc_ids = set()
         for r_id in range(len(results)):
             for r in range(len(results[r_id])):
         scores = []
+        # Function to rerank a single document based on its relevance to the query
         def rerank_single_doc(doc_id, data, client, collection_name):
             doc_colbert_vecs = client.query(
                 collection_name=collection_name,
+                filter=f"doc_id in [{doc_id}, {doc_id + 1}]",  # Query documents by ID
+                output_fields=["seq_id", "vector", "doc"],  # Fields to retrieve
+                limit=1000,  # Retrieve a maximum of 1000 vectors per document
             )
+            # Compute the maximum similarity score for the document
             doc_vecs = np.vstack(
                 [doc_colbert_vecs[i]["vector"] for i in range(len(doc_colbert_vecs))]
             )
             score = np.dot(data, doc_vecs.T).max(1).sum()
             return (score, doc_id)
+        # Use multithreading to rerank documents in parallel
         with concurrent.futures.ThreadPoolExecutor(max_workers=300) as executor:
             futures = {
                 executor.submit(
                 score, doc_id = future.result()
                 scores.append((score, doc_id))
+        # Sort scores in descending order and return the top-k results
         scores.sort(key=lambda x: x[0], reverse=True)
+        return scores[:topk] if len(scores) >= topk else scores
     def insert(self, data):
+        """
+        Insert a batch of data into the collection.
+        Args:
+            data (dict): Dictionary containing vector embeddings and metadata.
+        """
         colbert_vecs = [vec for vec in data["colbert_vecs"]]
         seq_length = len(colbert_vecs)
         doc_ids = [data["doc_id"] for i in range(seq_length)]
         seq_ids = list(range(seq_length))
         docs = [""] * seq_length
+        docs[0] = data["filepath"]  # Store file path in the first entry
+        # Insert the data into the collection
         self.client.insert(
             self.collection_name,
             [
             ],
         )
+    def get_images_as_doc(self, images_with_vectors: list):
+        """
+        Convert image data with vectors into document-like format for insertion.
+        Args:
+            images_with_vectors (list): List of dictionaries containing image vectors and file paths.
+        Returns:
+            list: Transformed data ready for insertion.
+        """
+        images_data = []
         for i in range(len(images_with_vectors)):
             data = {
                 "colbert_vecs": images_with_vectors[i]["colbert_vecs"],
                 "filepath": images_with_vectors[i]["filepath"],
             }
             images_data.append(data)
         return images_data
     def insert_images_data(self, image_data):
+        """
+        Insert processed image data into the collection.
+        Args:
+            image_data (list): List of image data dictionaries.
+        """
+        data = self.get_images_as_doc(image_data)
         for i in range(len(data)):
+            self.insert(data[i])  # Insert each item individually

pdf_manager.py CHANGED Viewed

@@ -1,42 +1,79 @@
-from pdf2image import convert_from_path
-import os
-import shutil
 class PdfManager:
     def __init__(self):
         pass
     def clear_and_recreate_dir(self, output_folder):
         print(f"Clearing output folder {output_folder}")
         if os.path.exists(output_folder):
-            shutil.rmtree(output_folder)
         os.makedirs(output_folder)
     def save_images(self, id, pdf_path, max_pages, pages: list[int] = None) -> list[str]:
         output_folder = f"pages/{id}/"
-        images = convert_from_path(pdf_path)
         print(f"Saving images from {pdf_path} to {output_folder}. Max pages: {max_pages}")
         self.clear_and_recreate_dir(output_folder)
-        num_page_processed = 0
         for i, image in enumerate(images):
             if max_pages and num_page_processed >= max_pages:
                 break
             if pages and i not in pages:
                 continue
             full_save_path = f"{output_folder}/page_{i + 1}.png"
-            #print(f"Saving image to {full_save_path}")
             image.save(full_save_path, "PNG")
-            num_page_processed += 1
         return [f"{output_folder}/page_{i + 1}.png" for i in range(num_page_processed)]

+# Import necessary modules
+from pdf2image import convert_from_path  # Convert PDF pages to images
+import os  # For file and directory operations
+import shutil  # For removing and recreating directories
 class PdfManager:
+    """
+    A manager class for handling PDF-related operations, such as converting pages to images
+    and managing output directories.
+    """
     def __init__(self):
+        """
+        Initialize the PdfManager.
+        Currently, no attributes are set during initialization.
+        """
         pass
     def clear_and_recreate_dir(self, output_folder):
+        """
+        Clear the specified directory and recreate it.
+        Args:
+            output_folder (str): Path to the directory to be cleared and recreated.
+        """
         print(f"Clearing output folder {output_folder}")
+        # Remove the directory if it exists
         if os.path.exists(output_folder):
+            shutil.rmtree(output_folder)  # Delete the folder and its contents
+        # Recreate the directory
         os.makedirs(output_folder)
     def save_images(self, id, pdf_path, max_pages, pages: list[int] = None) -> list[str]:
+        """
+        Convert PDF pages to images and save them to a specified directory.
+        Args:
+            id (str): Unique identifier for the output folder.
+            pdf_path (str): Path to the PDF file to be processed.
+            max_pages (int): Maximum number of pages to convert and save.
+            pages (list[int], optional): Specific page numbers to convert (default is None for all).
+        Returns:
+            list[str]: List of paths to the saved images.
+        """
+        # Define the output folder for the images
         output_folder = f"pages/{id}/"
+        # Convert the PDF pages to images
+        images = convert_from_path(pdf_path)
         print(f"Saving images from {pdf_path} to {output_folder}. Max pages: {max_pages}")
+        # Clear the existing directory and recreate it
         self.clear_and_recreate_dir(output_folder)
+        num_page_processed = 0  # Counter for the number of pages processed
+        # Iterate through the converted images
         for i, image in enumerate(images):
+            # Stop processing if the maximum number of pages is reached
             if max_pages and num_page_processed >= max_pages:
                 break
+            # Skip pages not in the specified list (if provided)
             if pages and i not in pages:
                 continue
+            # Define the save path for the current page
             full_save_path = f"{output_folder}/page_{i + 1}.png"
+            # Save the image in PNG format
             image.save(full_save_path, "PNG")
+            num_page_processed += 1  # Increment the processed page counter
+        # Return the paths of the saved images
         return [f"{output_folder}/page_{i + 1}.png" for i in range(num_page_processed)]

rag.py CHANGED Viewed

@@ -1,104 +1,188 @@
-import requests
-import os
-import google.generativeai as genai
-from typing import List
-from utils import encode_image
-from PIL import Image
 class Rag:
-    def get_answer_from_gemini(self, query, imagePaths):
-        print(f"Querying Gemini for query={query}, imagePaths={imagePaths}")
-        try:
-            genai.configure(api_key=os.environ['GEMINI_API_KEY'])
-            model = genai.GenerativeModel('gemini-1.5-flash')
-            images = [Image.open(path) for path in imagePaths]
-            chat = model.start_chat()
-            response = chat.send_message([*images, query])
-            answer = response.text
-            print(answer)
-            return answer
-        except Exception as e:
-            print(f"An error occurred while querying Gemini: {e}")
-            return f"Error: {str(e)}"
-    def get_answer_from_openai(self, query, imagesPaths):
         print(f"Querying OpenAI for query={query}, imagesPaths={imagesPaths}")
-        try:
             payload = self.__get_openai_api_payload(query, imagesPaths)
             headers = {
                 "Content-Type": "application/json",
-                "Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}"
             }
             response = requests.post(
                 url="https://api.openai.com/v1/chat/completions",
                 headers=headers,
                 json=payload
             )
-            response.raise_for_status()  # Raise an HTTPError for bad responses
             answer = response.json()["choices"][0]["message"]["content"]
-            print(answer)
             return answer
         except Exception as e:
             print(f"An error occurred while querying OpenAI: {e}")
             return None
-    def __get_openai_api_payload(self, query:str, imagesPaths:List[str]):
-        image_payload = []
         for imagePath in imagesPaths:
-            base64_image = encode_image(imagePath)
             image_payload.append({
                 "type": "image_url",
                 "image_url": {
-                    "url": f"data:image/jpeg;base64,{base64_image}"
                 }
             })
         payload = {
-            "model": "gpt-4o",
             "messages": [
                 {
-                    "role": "user",
                     "content": [
                         {
                             "type": "text",
-                            "text": query
                         },
-                        *image_payload
                     ]
                 }
             ],
-            "max_tokens": 1024
         }
         return payload
-# if __name__ == "__main__":
-#     rag = Rag()
-#     query = "Based on attached images, how many new cases were reported during second wave peak"
-#     imagesPaths = ["covid_slides_page_8.png", "covid_slides_page_8.png"]
-#     rag.get_answer_from_gemini(query, imagesPaths)

+# Import required libraries
+import requests  # For making HTTP requests
+import os  # For accessing environment variables
+import google.generativeai as genai  # For interacting with Google's Generative AI APIs
+from typing import List  # For type annotations
+from utils import encode_image  # Utility function to encode images as base64
+from PIL import Image  # For image processing
 class Rag:
+    """
+    A class for interacting with Generative AI models (Gemini and OpenAI) to retrieve answers
+    based on user queries and associated images.
+    """
+    # def get_answer_from_gemini(self, query: str, imagePaths: List[str]) -> str:
+    #     """
+    #     Query the Gemini model with a text query and associated images.
+    #     Args:
+    #         query (str): The user's query.
+    #         imagePaths (List[str]): List of file paths to images.
+    #     Returns:
+    #         str: The response text from the Gemini model.
+    #     """
+    #     print(f"Querying Gemini for query={query}, imagePaths={imagePaths}")
+    #     try:
+    #         # Configure the Gemini API client using the API key from environment variables
+    #         genai.configure(api_key=os.environ['GEMINI_API_KEY'])
+    #         # Initialize the Gemini generative model
+    #         model = genai.GenerativeModel('gemini-1.5-flash')
+    #         # Load images from the given paths
+    #         images = [Image.open(path) for path in imagePaths]
+    #         # Start a new chat session
+    #         chat = model.start_chat()
+    #         # Send the query and images to the model
+    #         response = chat.send_message([*images, query])
+    #         # Extract the response text
+    #         answer = response.text
+    #         print(answer)  # Log the answer
+    #         return answer
+    #     except Exception as e:
+    #         # Handle and log any errors that occur
+    #         print(f"An error occurred while querying Gemini: {e}")
+    #         return f"Error: {str(e)}"
+    def get_answer_from_openai(self, query: str, imagesPaths: List[str]) -> str:
+        """
+        Query OpenAI's GPT model with a text query and associated images.
+        Args:
+            query (str): The user's query.
+            imagesPaths (List[str]): List of file paths to images.
+        Returns:
+            str: The response text from OpenAI.
+        """
         print(f"Querying OpenAI for query={query}, imagesPaths={imagesPaths}")
+        try:
+            # Prepare the API payload with the query and images
             payload = self.__get_openai_api_payload(query, imagesPaths)
+            # Define the HTTP headers for the OpenAI API request
             headers = {
                 "Content-Type": "application/json",
+                "Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}"  # API key from environment variables
             }
+            # Send a POST request to the OpenAI API
             response = requests.post(
                 url="https://api.openai.com/v1/chat/completions",
                 headers=headers,
                 json=payload
             )
+            response.raise_for_status()  # Raise an error for unsuccessful requests
+            # Extract the content of the response
             answer = response.json()["choices"][0]["message"]["content"]
+            print(answer)  # Log the answer
             return answer
         except Exception as e:
+            # Handle and log any errors that occur
             print(f"An error occurred while querying OpenAI: {e}")
             return None
+    def get_answer_from_gemini(self, query: str, imagePaths: List[str]) -> str:
+        """
+        Query the Gemini model with a text query and associated images.
+        Args:
+            query (str): The user's query.
+            imagePaths (List[str]): List of file paths to images.
+        Returns:
+            str: The response text from the Gemini model.
+        """
+        print(f"Querying Gemini for query={query}, imagePaths={imagePaths}")
+        try:
+            # Configure the Gemini API client using the API key from environment variables
+            genai.configure(api_key=os.environ['GEMINI_API_KEY'])
+            # Initialize the Gemini generative model
+            model = genai.GenerativeModel('gemini-1.5-flash')
+            # Load images from the given paths (skip missing files)
+            images = []
+            for path in imagePaths:
+                if os.path.exists(path):
+                    images.append(Image.open(path))
+                else:
+                    print(f"Warning: Image not found {path}, skipping.")
+            # Start a new chat session
+            chat = model.start_chat()
+            # Construct the input for the model (handle cases with and without images)
+            input_data = [query] if not images else [*images, query]
+            # Send the query (and images, if any) to the model
+            response = chat.send_message(input_data)
+            # Extract the response text
+            answer = response.text
+            print(answer)  # Log the answer
+            return answer
+        except Exception as e:
+            # Handle and log any errors that occur
+            print(f"An error occurred while querying Gemini: {e}")
+            return f"Error: {str(e)}"
+    def __get_openai_api_payload(self, query: str, imagesPaths: List[str]) -> dict:
+        """
+        Prepare the payload for the OpenAI API request.
+        Args:
+            query (str): The user's query.
+            imagesPaths (List[str]): List of file paths to images.
+        Returns:
+            dict: The payload for the OpenAI API request.
+        """
+        image_payload = []  # List to store encoded image data
+        # Encode each image as base64 and prepare the payload
         for imagePath in imagesPaths:
+            base64_image = encode_image(imagePath)  # Encode image in base64
             image_payload.append({
                 "type": "image_url",
                 "image_url": {
+                    "url": f"data:image/jpeg;base64,{base64_image}"  # Embed image data as a URL
                 }
             })
+        # Create the complete payload for the API request
         payload = {
+            "model": "gpt-4o",  # Specify the OpenAI model
             "messages": [
                 {
+                    "role": "user",  # Role of the message sender
                     "content": [
                         {
                             "type": "text",
+                            "text": query  # Include the user's query
                         },
+                        *image_payload  # Include the image data
                     ]
                 }
             ],
+            "max_tokens": 1024  # Limit the response length
         }
         return payload

utils.py CHANGED Viewed

@@ -1,5 +1,16 @@
-import base64
-def encode_image(image_path):
     with open(image_path, "rb") as image_file:
         return base64.b64encode(image_file.read()).decode('utf-8')

+import base64  # Library for encoding and decoding data in base64 format
+def encode_image(image_path: str) -> str:
+    """
+    Encode an image file to a base64 string.
+    Args:
+        image_path (str): The file path of the image to be encoded.
+    Returns:
+        str: The base64-encoded string representation of the image.
+    """
+    # Open the image file in binary read mode
     with open(image_path, "rb") as image_file:
+        # Read the image content, encode it to base64, and decode it to a UTF-8 string
         return base64.b64encode(image_file.read()).decode('utf-8')