Spaces:

Atulit23
/

ColPali

Sleeping

App Files Files Community

Atulit23 commited on Sep 29, 2024

Commit

738e435

verified ·

1 Parent(s): 4dd036e

Upload folder using huggingface_hub

Browse files

Files changed (12) hide show

.byaldi/image_index/doc_ids_to_file_names.json.gz +3 -0
.byaldi/image_index/embed_id_to_doc_id.json.gz +3 -0
.byaldi/image_index/embeddings/embeddings_0.pt +3 -0
.byaldi/image_index/index_config.json.gz +3 -0
.byaldi/image_index/metadata.json.gz +3 -0
.github/workflows/update_space.yml +28 -0
README.md +29 -7
app.py +82 -0
copali-qwen.ipynb +280 -0
image.png +0 -0
packages.txt +1 -0
requirements.txt +10 -0

.byaldi/image_index/doc_ids_to_file_names.json.gz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a5adce3e520525f462d8f71c09a42b3ca10cc5039b79cd1640e0c0d97acd9e17
+size 68

.byaldi/image_index/embed_id_to_doc_id.json.gz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:60aedd13c343e38d2cb81b0c953b2f4b3db530f44b96af3167f63ff218c831ba
+size 79

.byaldi/image_index/embeddings/embeddings_0.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:404010331a12bd6c9dd18c87358a10c1c1ecf58e19c2dd402ae1757cded340e6
+size 264885

.byaldi/image_index/index_config.json.gz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:83da74cae33705bdcbdc43f436f45ec099b683245a56d7fd72336954916e9a3c
+size 174

.byaldi/image_index/metadata.json.gz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a23514d5d0a1b04d797c42e596342a4b3203e7ed7886d6cad63c97ee0ae49b58
+size 38

.github/workflows/update_space.yml ADDED Viewed

	@@ -0,0 +1,28 @@

+name: Run Python script
+on:
+  push:
+    branches:
+      - main
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+    - name: Checkout
+      uses: actions/checkout@v2
+    - name: Set up Python
+      uses: actions/setup-python@v2
+      with:
+        python-version: '3.9'
+    - name: Install Gradio
+      run: python -m pip install gradio
+    - name: Log in to Hugging Face
+      run: python -c 'import huggingface_hub; huggingface_hub.login(token="${{ secrets.hf_token }}")'
+    - name: Deploy to Spaces
+      run: gradio deploy

README.md CHANGED Viewed

@@ -1,12 +1,34 @@
 ---
 title: ColPali
-emoji: 🔥
-colorFrom: gray
-colorTo: pink
-sdk: gradio
-sdk_version: 4.44.0
 app_file: app.py
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: ColPali
 app_file: app.py
+sdk: gradio
+sdk_version: 4.41.0
 ---
+# RAG-based PDF Search and Keyword Extraction using Qwen2VL
+This repository contains an implementation of a **RAG (Retrieval-Augmented Generation)** based PDF search system using **Copali's implementation** of the Byaldi library and **Qwen2VL** for creating the RAG pipeline. Additionally, the repository includes a Gradio app that allows users to extract text from images and highlight searched keywords using **Qwen2VL**.
+## Table of Contents
+- [Overview](#overview)
+- [Installation](#installation)
+- [Usage](#usage)
+  - [RAG PDF Search](#rag-pdf-search)
+  - [Gradio App for Keyword Extraction](#gradio-app-for-keyword-extraction)
+- [License](#license)
+## Overview
+### RAG PDF Search
+In `copali-qwen.ipynb`, you will find the complete implementation of the **RAG-based PDF search**. The pipeline is built using the **Copali** implementation of the Byaldi library, along with **Qwen2VL**. By default, the code indexes and searches through an image (`image.png`), but you can easily modify the path to a PDF file or any other desired document.
+### Gradio App for Keyword Extraction
+The `app.py` file contains a **Gradio app** that utilizes only **Qwen2VL** to extract text from an image and highlight the keywords matching the user's search query. This app is an easy-to-use interface for real-time keyword extraction from images.
+## Installation
+To run this project, you will need to install the following dependencies:
+```bash
+pip install transformers byaldi qwen-vl-utils gradio pillow torch

app.py ADDED Viewed

	@@ -0,0 +1,82 @@

+import gradio as gr
+import torch
+from PIL import Image
+from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
+from qwen_vl_utils import process_vision_info
+import re
+min_pixels = 256 * 28 * 28
+max_pixels = 1280 * 28 * 28
+def model_inference(images):
+    model = Qwen2VLForConditionalGeneration.from_pretrained(
+        "Qwen/Qwen2-VL-2B-Instruct",
+        trust_remote_code=True,
+        torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32
+    )
+    processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", min_pixels=min_pixels, max_pixels=max_pixels)
+    images = [{"type": "image", "image": Image.open(image[0])} for image in images]
+    messages = [{"role": "user", "content": images}]
+    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+    image_inputs, video_inputs = process_vision_info(messages)
+    inputs = processor(
+        text=[text],
+        images=image_inputs,
+        videos=video_inputs,
+        padding=True,
+        return_tensors="pt",
+    )
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    inputs = inputs.to(device)
+    model = model.to(device)
+    generated_ids = model.generate(**inputs, max_new_tokens=512)
+    generated_ids_trimmed = [
+        out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
+    ]
+    output_text = processor.batch_decode(
+        generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
+    )
+    del model
+    del processor
+    return output_text[0]
+def search_and_highlight(text, keywords):
+    if not keywords:
+        return text
+    keywords = [kw.strip().lower() for kw in keywords.split(',')]
+    highlighted_text = text
+    for keyword in keywords:
+        pattern = re.compile(re.escape(keyword), re.IGNORECASE)
+        highlighted_text = pattern.sub(f'**{keyword}**', highlighted_text)
+    return highlighted_text
+def extract_and_search(images, keywords):
+    extracted_text = model_inference(images)
+    highlighted_text = search_and_highlight(extracted_text, keywords)
+    return extracted_text, highlighted_text
+with gr.Blocks(theme=gr.themes.Soft()) as demo:
+    with gr.Row():
+        output_gallery = gr.Gallery(label="Image", height=300, show_label=True)
+        keywords = gr.Textbox(placeholder="Enter keywords to search (comma-separated)", label="Search Keywords")
+    extract_button = gr.Button("Extract Text and Search", variant="primary")
+    with gr.Row():
+        raw_output = gr.Textbox(label="Interpreted Text")
+        highlighted_output = gr.Markdown(label="Highlighted Search Results")
+    extract_button.click(extract_and_search, inputs=[output_gallery, keywords], outputs=[raw_output, highlighted_output])
+if __name__ == "__main__":
+    demo.queue(max_size=10).launch(share=True)

copali-qwen.ipynb ADDED Viewed

	@@ -0,0 +1,280 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Implementing Colpali with Qwen2VL"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "c:\\Users\\atuli\\AppData\\Local\\Programs\\Python\\Python310\\lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Verbosity is set to 1 (active). Pass verbose=0 to make quieter.\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.\n",
+      "Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use\n",
+      "`config.hidden_activation` if you want to override this behaviour.\n",
+      "See https://github.com/huggingface/transformers/pull/29402 for more details.\n",
+      "Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  6.01it/s]\n"
+     ]
+    }
+   ],
+   "source": [
+    "from byaldi import RAGMultiModalModel\n",
+    "\n",
+    "RAG = RAGMultiModalModel.from_pretrained(\"vidore/colpali\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "You are passing both `text` and `images` to `PaliGemmaProcessor`. The processor expects special image tokens in the text, as many tokens as there are images per each text. It is recommended to add `<image>` tokens in the very beginning of your text and `<bos>` token after that. For this call, we will infer how many images each text has and add special tokens.\n",
+      "Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Added page 1 of document 0 to index.\n",
+      "Index exported to .byaldi\\image_index\n",
+      "Index exported to .byaldi\\image_index\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{0: 'image.png'}"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "RAG.index(\n",
+    "    input_path=\"image.png\",\n",
+    "    index_name=\"image_index\",\n",
+    "    store_collection_with_index=False,\n",
+    "    overwrite=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "You are passing both `text` and `images` to `PaliGemmaProcessor`. The processor expects special image tokens in the text, as many tokens as there are images per each text. It is recommended to add `<image>` tokens in the very beginning of your text and `<bos>` token after that. For this call, we will infer how many images each text has and add special tokens.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[{'doc_id': 0, 'page_num': 1, 'score': 18.75, 'metadata': {}, 'base64': None}]"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "text_query = \"What is the structure of the compiler?\"\n",
+    "results = RAG.search(text_query, k=1)\n",
+    "results"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.\n",
+      "Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}\n",
+      "Loading checkpoint shards: 100%|██████████| 2/2 [00:13<00:00,  6.88s/it]\n"
+     ]
+    }
+   ],
+   "source": [
+    "from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor\n",
+    "from qwen_vl_utils import process_vision_info\n",
+    "import torch\n",
+    "\n",
+    "model = Qwen2VLForConditionalGeneration.from_pretrained(\n",
+    "        \"Qwen/Qwen2-VL-2B-Instruct\",\n",
+    "        trust_remote_code=True,\n",
+    "        torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "results[0][\"page_num\"] -1"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from PIL import Image\n",
+    "processor = AutoProcessor.from_pretrained(\"Qwen/Qwen2-VL-2B-Instruct\", trust_remote_code=True)\n",
+    "\n",
+    "messages = [\n",
+    "    {\n",
+    "        \"role\": \"user\",\n",
+    "        \"content\": [\n",
+    "            {\n",
+    "                \"type\": \"image\",\n",
+    "                \"image\": Image.open(\"image.png\"),\n",
+    "            },\n",
+    "            {\"type\": \"text\", \"text\": text_query},\n",
+    "        ],\n",
+    "    }\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "text = processor.apply_chat_template(\n",
+    "    messages, tokenize=False, add_generation_prompt=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "image_inputs, video_inputs = process_vision_info(messages)\n",
+    "inputs = processor(\n",
+    "    text=[text],\n",
+    "    images=image_inputs,\n",
+    "    videos=video_inputs,\n",
+    "    padding=True,\n",
+    "    return_tensors=\"pt\",\n",
+    ")\n",
+    "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
+    "inputs = inputs.to(device)\n",
+    "model = model.to(device)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "generated_ids = model.generate(**inputs, max_new_tokens=50)\n",
+    "generated_ids_trimmed = [\n",
+    "    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)\n",
+    "]\n",
+    "output_text = processor.batch_decode(\n",
+    "    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False\n",
+    ")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "['The structure of the compiler, as described in the syllabus, includes the following components:\\n\\n1. **Lexical Analysis**: This involves the role of the lexical analyzer, input buffering, and the design of lexical analyzers, specification and recognition of tokens']\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(output_text)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

image.png ADDED Viewed

packages.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ poppler-utils

requirements.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+colpali-engine==0.2.0
+pdf2image
+GPUtil
+accelerate==0.30.1
+mteb>=1.12.22
+git+https://github.com/huggingface/transformers
+qwen-vl-utils
+torchvision
+fastapi<0.113.0
+byaldi