ej68okap commited on
Commit
b61a6a6
Β·
1 Parent(s): 241c492

new code added

Browse files
Files changed (1) hide show
  1. README.md +8 -77
README.md CHANGED
@@ -1,79 +1,10 @@
1
- # Multimodal RAG with Colpali, Milvus, and Visual Language Models
2
-
3
- This repository demonstrates how to build a **Multimodal Retrieval-Augmented Generation (RAG)** application using **Colpali**, **Milvus**, and **Visual Language Models (VLMs)** like Gemini or GPT-4o. The application allows users to upload a PDF and perform Q&A queries on both textual and visual elements of the document.
4
-
5
  ---
6
-
7
- ## Features
8
-
9
- - **Multimodal Q&A**: Combines visual and textual embeddings for robust query answering.
10
- - **PDF as Images**: Treats PDF pages as images to preserve layout and visual context.
11
- - **Efficient Retrieval**: Utilizes Milvus for fast and accurate vector search.
12
- - **Advanced Query Processing**: Integrates Colpali and VLMs for embeddings and response generation.
13
-
14
  ---
15
-
16
- ## Architecture Overview
17
-
18
- 1. **Colpali**:
19
- - Generates embeddings for images (PDF pages) and text (user queries).
20
- - Processes visual and textual data seamlessly.
21
-
22
- 2. **Milvus**:
23
- - A vector database used for indexing and retrieving embeddings.
24
- - Supports HNSW-based indexing for efficient similarity searches.
25
-
26
- 3. **Visual Language Models**:
27
- - Gemini or GPT-4o performs context-aware Q&A using retrieved pages.
28
-
29
- ---
30
-
31
- ## Installation
32
-
33
- ### Prerequisites
34
- - Python 3.8 or higher
35
- - CUDA-compatible GPU for acceleration
36
- - Milvus installed and running ([Installation Guide](https://milvus.io/docs/install_standalone.md))
37
- - Required Python packages (see `requirements.txt`)
38
-
39
- ### Steps to Run the Application Locally
40
- 1. Clone the repository
41
- 2. Install dependencies as **pip install -r requirements.txt**
42
- 3. Set up environment variables
43
- Add the following variables to your .env file or environment:
44
- GEMINI_API_KEY=<Your_Gemini_API_Key>
45
- 4. Launch the Gradio App as **python app.py**
46
-
47
-
48
- ### Deploying the Gradio App on Hugging Face Spaces
49
- 1. Prepare the Repository
50
- git clone https://github.com/saumitras/colpali-milvus-rag.git
51
- cd colpali-milvus-rag
52
-
53
- 2. Organize the Repository:
54
- Ensure the app file (e.g., app.py) contains the Gradio application code.
55
- Include the requirements.txt file for dependencies.
56
-
57
- Update the Hugging Face API Configuration:
58
-
59
- 3. Add necessary environment variables like GEMINI_API_KEY or OPENAI_API_KEY to the Hugging Face Spaces Secrets:
60
- Navigate to your Hugging Face Space.
61
- Go to the Settings tab and add the required secrets under Repository secrets.
62
-
63
- 4. Create a New Space
64
- Visit Hugging Face Spaces.
65
- Click New Space.
66
- Fill in the details:
67
- Name: Give your Space a unique name (e.g., multimodal_rag).
68
- SDK: Select Gradio as the SDK.
69
- Visibility: Choose between Public or Private.
70
- Click Create Space.
71
- 5. Push Code to Hugging Face
72
- Initialize Git and push the code:
73
- git remote add hf https://huggingface.co/spaces/ultron1996/multimodal_rag
74
- git push hf main
75
-
76
- 6. Wait for the Hugging Face Space to build and deploy the application.
77
-
78
-
79
- The app has been deployed on Hugging Face Spaces and Demo is running at https://huggingface.co/spaces/ultron1996/multimodal_rag
 
 
 
 
 
1
  ---
2
+ title: Multimodal Rag
3
+ emoji: 🐨
4
+ colorFrom: indigo
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: 5.12.0
8
+ app_file: app.py
9
+ pinned: false
10
  ---