Spaces:
Runtime error
Runtime error
ej68okap
commited on
Commit
Β·
9fe4df8
1
Parent(s):
a53d884
new code added
Browse files
README.md
CHANGED
@@ -8,3 +8,81 @@ sdk_version: 5.12.0
|
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
---
|
11 |
+
|
12 |
+
# Multimodal RAG with Colpali, Milvus, and Visual Language Models
|
13 |
+
|
14 |
+
This repository demonstrates how to build a **Multimodal Retrieval-Augmented Generation (RAG)** application using **Colpali**, **Milvus**, and **Visual Language Models (VLMs)** like Gemini or GPT-4o. The application allows users to upload a PDF and perform Q&A queries on both textual and visual elements of the document.
|
15 |
+
|
16 |
+
---
|
17 |
+
|
18 |
+
## Features
|
19 |
+
|
20 |
+
- **Multimodal Q&A**: Combines visual and textual embeddings for robust query answering.
|
21 |
+
- **PDF as Images**: Treats PDF pages as images to preserve layout and visual context.
|
22 |
+
- **Efficient Retrieval**: Utilizes Milvus for fast and accurate vector search.
|
23 |
+
- **Advanced Query Processing**: Integrates Colpali and VLMs for embeddings and response generation.
|
24 |
+
|
25 |
+
---
|
26 |
+
|
27 |
+
## Architecture Overview
|
28 |
+
|
29 |
+
1. **Colpali**:
|
30 |
+
- Generates embeddings for images (PDF pages) and text (user queries).
|
31 |
+
- Processes visual and textual data seamlessly.
|
32 |
+
|
33 |
+
2. **Milvus**:
|
34 |
+
- A vector database used for indexing and retrieving embeddings.
|
35 |
+
- Supports HNSW-based indexing for efficient similarity searches.
|
36 |
+
|
37 |
+
3. **Visual Language Models**:
|
38 |
+
- Gemini or GPT-4o performs context-aware Q&A using retrieved pages.
|
39 |
+
|
40 |
+
---
|
41 |
+
|
42 |
+
## Installation
|
43 |
+
|
44 |
+
### Prerequisites
|
45 |
+
- Python 3.8 or higher
|
46 |
+
- CUDA-compatible GPU for acceleration
|
47 |
+
- Milvus installed and running ([Installation Guide](https://milvus.io/docs/install_standalone.md))
|
48 |
+
- Required Python packages (see `requirements.txt`)
|
49 |
+
|
50 |
+
### Steps to Run the Application Locally
|
51 |
+
1. Clone the repository
|
52 |
+
2. Install dependencies as **pip install -r requirements.txt**
|
53 |
+
3. Set up environment variables
|
54 |
+
Add the following variables to your .env file or environment:
|
55 |
+
GEMINI_API_KEY=<Your_Gemini_API_Key>
|
56 |
+
4. Launch the Gradio App as **python app.py**
|
57 |
+
|
58 |
+
### Deploying the Gradio App on Hugging Face Spaces
|
59 |
+
1. Prepare the Repository
|
60 |
+
git clone https://github.com/saumitras/colpali-milvus-rag.git
|
61 |
+
cd colpali-milvus-rag
|
62 |
+
|
63 |
+
2. Organize the Repository:
|
64 |
+
Ensure the app file (e.g., app.py) contains the Gradio application code.
|
65 |
+
Include the requirements.txt file for dependencies.
|
66 |
+
|
67 |
+
Update the Hugging Face API Configuration:
|
68 |
+
|
69 |
+
3. Add necessary environment variables like GEMINI_API_KEY or OPENAI_API_KEY to the Hugging Face Spaces Secrets:
|
70 |
+
Navigate to your Hugging Face Space.
|
71 |
+
Go to the Settings tab and add the required secrets under Repository secrets.
|
72 |
+
|
73 |
+
4. Create a New Space
|
74 |
+
Visit Hugging Face Spaces.
|
75 |
+
Click New Space.
|
76 |
+
Fill in the details:
|
77 |
+
Name: Give your Space a unique name (e.g., multimodal_rag).
|
78 |
+
SDK: Select Gradio as the SDK.
|
79 |
+
Visibility: Choose between Public or Private.
|
80 |
+
Click Create Space.
|
81 |
+
5. Push Code to Hugging Face
|
82 |
+
Initialize Git and push the code:
|
83 |
+
git remote add hf https://huggingface.co/spaces/ultron1996/multimodal_rag
|
84 |
+
git push hf main
|
85 |
+
|
86 |
+
6. Wait for the Hugging Face Space to build and deploy the application.
|
87 |
+
|
88 |
+
The app has been deployed on Hugging Face Spaces and Demo is running at https://huggingface.co/spaces/ultron1996/multimodal_rag
|