ej68okap commited on
Commit
9fe4df8
Β·
1 Parent(s): a53d884

new code added

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md CHANGED
@@ -8,3 +8,81 @@ sdk_version: 5.12.0
8
  app_file: app.py
9
  pinned: false
10
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  app_file: app.py
9
  pinned: false
10
  ---
11
+
12
+ # Multimodal RAG with Colpali, Milvus, and Visual Language Models
13
+
14
+ This repository demonstrates how to build a **Multimodal Retrieval-Augmented Generation (RAG)** application using **Colpali**, **Milvus**, and **Visual Language Models (VLMs)** like Gemini or GPT-4o. The application allows users to upload a PDF and perform Q&A queries on both textual and visual elements of the document.
15
+
16
+ ---
17
+
18
+ ## Features
19
+
20
+ - **Multimodal Q&A**: Combines visual and textual embeddings for robust query answering.
21
+ - **PDF as Images**: Treats PDF pages as images to preserve layout and visual context.
22
+ - **Efficient Retrieval**: Utilizes Milvus for fast and accurate vector search.
23
+ - **Advanced Query Processing**: Integrates Colpali and VLMs for embeddings and response generation.
24
+
25
+ ---
26
+
27
+ ## Architecture Overview
28
+
29
+ 1. **Colpali**:
30
+ - Generates embeddings for images (PDF pages) and text (user queries).
31
+ - Processes visual and textual data seamlessly.
32
+
33
+ 2. **Milvus**:
34
+ - A vector database used for indexing and retrieving embeddings.
35
+ - Supports HNSW-based indexing for efficient similarity searches.
36
+
37
+ 3. **Visual Language Models**:
38
+ - Gemini or GPT-4o performs context-aware Q&A using retrieved pages.
39
+
40
+ ---
41
+
42
+ ## Installation
43
+
44
+ ### Prerequisites
45
+ - Python 3.8 or higher
46
+ - CUDA-compatible GPU for acceleration
47
+ - Milvus installed and running ([Installation Guide](https://milvus.io/docs/install_standalone.md))
48
+ - Required Python packages (see `requirements.txt`)
49
+
50
+ ### Steps to Run the Application Locally
51
+ 1. Clone the repository
52
+ 2. Install dependencies as **pip install -r requirements.txt**
53
+ 3. Set up environment variables
54
+ Add the following variables to your .env file or environment:
55
+ GEMINI_API_KEY=<Your_Gemini_API_Key>
56
+ 4. Launch the Gradio App as **python app.py**
57
+
58
+ ### Deploying the Gradio App on Hugging Face Spaces
59
+ 1. Prepare the Repository
60
+ git clone https://github.com/saumitras/colpali-milvus-rag.git
61
+ cd colpali-milvus-rag
62
+
63
+ 2. Organize the Repository:
64
+ Ensure the app file (e.g., app.py) contains the Gradio application code.
65
+ Include the requirements.txt file for dependencies.
66
+
67
+ Update the Hugging Face API Configuration:
68
+
69
+ 3. Add necessary environment variables like GEMINI_API_KEY or OPENAI_API_KEY to the Hugging Face Spaces Secrets:
70
+ Navigate to your Hugging Face Space.
71
+ Go to the Settings tab and add the required secrets under Repository secrets.
72
+
73
+ 4. Create a New Space
74
+ Visit Hugging Face Spaces.
75
+ Click New Space.
76
+ Fill in the details:
77
+ Name: Give your Space a unique name (e.g., multimodal_rag).
78
+ SDK: Select Gradio as the SDK.
79
+ Visibility: Choose between Public or Private.
80
+ Click Create Space.
81
+ 5. Push Code to Hugging Face
82
+ Initialize Git and push the code:
83
+ git remote add hf https://huggingface.co/spaces/ultron1996/multimodal_rag
84
+ git push hf main
85
+
86
+ 6. Wait for the Hugging Face Space to build and deploy the application.
87
+
88
+ The app has been deployed on Hugging Face Spaces and Demo is running at https://huggingface.co/spaces/ultron1996/multimodal_rag