Spaces:

ultron1996
/

multimodal_rag

Runtime error

App Files Files Community

ej68okap commited on Jan 29

Commit

9fe4df8

1 Parent(s): a53d884

new code added

Browse files

Files changed (1) hide show

README.md +78 -0

README.md CHANGED Viewed

@@ -8,3 +8,81 @@ sdk_version: 5.12.0
 app_file: app.py
 pinned: false
 ---

 app_file: app.py
 pinned: false
 ---
+# Multimodal RAG with Colpali, Milvus, and Visual Language Models
+This repository demonstrates how to build a **Multimodal Retrieval-Augmented Generation (RAG)** application using **Colpali**, **Milvus**, and **Visual Language Models (VLMs)** like Gemini or GPT-4o. The application allows users to upload a PDF and perform Q&A queries on both textual and visual elements of the document.
+---
+## Features
+- **Multimodal Q&A**: Combines visual and textual embeddings for robust query answering.
+- **PDF as Images**: Treats PDF pages as images to preserve layout and visual context.
+- **Efficient Retrieval**: Utilizes Milvus for fast and accurate vector search.
+- **Advanced Query Processing**: Integrates Colpali and VLMs for embeddings and response generation.
+---
+## Architecture Overview
+1. **Colpali**:
+   - Generates embeddings for images (PDF pages) and text (user queries).
+   - Processes visual and textual data seamlessly.
+2. **Milvus**:
+   - A vector database used for indexing and retrieving embeddings.
+   - Supports HNSW-based indexing for efficient similarity searches.
+3. **Visual Language Models**:
+   - Gemini or GPT-4o performs context-aware Q&A using retrieved pages.
+---
+## Installation
+### Prerequisites
+- Python 3.8 or higher
+- CUDA-compatible GPU for acceleration
+- Milvus installed and running ([Installation Guide](https://milvus.io/docs/install_standalone.md))
+- Required Python packages (see `requirements.txt`)
+### Steps to Run the Application Locally
+1. Clone the repository
+2. Install dependencies as **pip install -r requirements.txt**
+3. Set up environment variables
+    Add the following variables to your .env file or environment:
+    GEMINI_API_KEY=<Your_Gemini_API_Key>
+4.  Launch the Gradio App as **python app.py**
+### Deploying the Gradio App on Hugging Face Spaces
+1. Prepare the Repository
+git clone https://github.com/saumitras/colpali-milvus-rag.git
+cd colpali-milvus-rag
+2. Organize the Repository:
+Ensure the app file (e.g., app.py) contains the Gradio application code.
+Include the requirements.txt file for dependencies.
+Update the Hugging Face API Configuration:
+3. Add necessary environment variables like GEMINI_API_KEY or OPENAI_API_KEY to the Hugging Face Spaces Secrets:
+Navigate to your Hugging Face Space.
+Go to the Settings tab and add the required secrets under Repository secrets.
+4. Create a New Space
+    Visit Hugging Face Spaces.
+    Click New Space.
+    Fill in the details:
+    Name: Give your Space a unique name (e.g., multimodal_rag).
+    SDK: Select Gradio as the SDK.
+    Visibility: Choose between Public or Private.
+    Click Create Space.
+5. Push Code to Hugging Face
+    Initialize Git and push the code:
+    git remote add hf https://huggingface.co/spaces/ultron1996/multimodal_rag
+    git push hf main
+6. Wait for the Hugging Face Space to build and deploy the application.
+The app has been deployed on Hugging Face Spaces and Demo is running at https://huggingface.co/spaces/ultron1996/multimodal_rag