Safetensors
qwen2_vl
vidore
reranker
uminaty commited on
Commit
de61fd2
1 Parent(s): 9a6ddc1

Update README.md

Browse files

TODO change value in table vidore when vidore/tatdqa_test ready

Files changed (1) hide show
  1. README.md +69 -3
README.md CHANGED
@@ -1,3 +1,69 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # MonoQwen2-VL-2B-LoRA-Reranker
5
+
6
+ ## Model Overview
7
+ The **MonoQwen2-VL-2B-LoRA-Reranker** is a fine-tuned version of the Qwen2-VL-2B model, optimized for reranking image-query relevance. It is built to process visual and text data and generate binary relevance scores. This model can be used in scenarios where reranking image relevance is crucial, such as document analysis and image-based search tasks.
8
+
9
+ ## How to Use the Model
10
+ Below is a quick example to rerank a single image against a user query using this model:
11
+
12
+ ```python
13
+ import torch
14
+ from PIL import Image
15
+ from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
16
+
17
+ # Load processor and model
18
+ processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
19
+ model = Qwen2VLForConditionalGeneration.from_pretrained("lightonai/MonoQwen2-VL-2B-LoRA-Reranker")
20
+
21
+ # Define the query and the image
22
+ query = "What is the value of the thing in the document"
23
+ image = Image.open("path_to_image.jpg")
24
+
25
+ # Prepare the inputs
26
+ prompt = f"Assert the relevance of the previous image document to the following query, answer True or False. The query is: {query}"
27
+ inputs = processor(text=prompt, images=image, return_tensors="pt")
28
+
29
+ # Run the model and obtain results
30
+ with torch.no_grad():
31
+ outputs = model(**inputs)
32
+ logits = outputs.logits
33
+ logits_for_last_token = logits[:, -1, :]
34
+ true_token_id = processor.tokenizer.convert_tokens_to_ids("True")
35
+ false_token_id = processor.tokenizer.convert_tokens_to_ids("False")
36
+ relevance_score = torch.softmax(logits_for_last_token[:, [true_token_id, false_token_id]], dim=-1)
37
+
38
+ # Print the True/False probabilities
39
+ true_prob = relevance_score[:, 0].item()
40
+ false_prob = relevance_score[:, 1].item()
41
+
42
+ print(f"True probability: {true_prob}, False probability: {false_prob}")
43
+ ```
44
+
45
+ This example demonstrates how to use the model to assess the relevance of an image with respect to a query. It outputs the probability that the image is relevant ("True") or not relevant ("False").
46
+
47
+ ## Performance Metrics
48
+
49
+ The model has been evaluated on [ViDoRe Benchmark](https://huggingface.co/spaces/vidore/vidore-leaderboard), by retrieving 10 elements with [MrLight_dse-qwen2-2b-mrl-v1](https://huggingface.co/MrLight/dse-qwen2-2b-mrl-v1) and reranking them. The table below summarizes its `ndcg@5` scores:
50
+
51
+ | Dataset | NDCG@5 Before Reranking | NDCG@5 After Reranking |
52
+ |---------------------------------------------------|--------------------------|------------------------|
53
+ | **Mean** | 87.6 | **91.8** |
54
+ | vidore/arxivqa_test_subsampled | 85.6 | 89.01 |
55
+ | vidore/docvqa_test_subsampled | 57.1 | 59.71 |
56
+ | vidore/infovqa_test_subsampled | 88.1 | 93.49 |
57
+ | vidore/tabfquad_test_subsampled | 93.1 | 95.96 |
58
+ | vidore/shiftproject_test | 82.0 | 92.98 |
59
+ | vidore/syntheticDocQA_artificial_intelligence_test| 97.5 | 100.00 |
60
+ | vidore/syntheticDocQA_energy_test | 92.9 | 97.65 |
61
+ | vidore/syntheticDocQA_government_reports_test | 96.0 | 98.04 |
62
+ | vidore/syntheticDocQA_healthcare_industry_test | 96.4 | 99.27 |
63
+
64
+
65
+
66
+
67
+ ## License
68
+
69
+ This LoRA model is licensed under the Apache 2.0 license.