Transformers
Safetensors
English
comics
File size: 3,191 Bytes
a2efbb0
 
dd930ba
 
 
 
 
 
 
 
 
a2efbb0
 
dd930ba
a2efbb0
dd930ba
 
a2efbb0
dd930ba
 
 
 
 
 
 
 
 
a2efbb0
dd930ba
a2efbb0
 
dd930ba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
library_name: transformers
tags:
- comics
license: cc-by-sa-4.0
datasets:
- VLR-CVC/ComicsPAP
language:
- en
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
---

# Lora Fine-Tune of Qwen2.5-VL-3B-Instruct on ComicsPAP datataset

[Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) fine-tunined simultaneously in all five tasks of the [ComicsPAP](https://huggingface.co/datasets/VLR-CVC/ComicsPAP) dataset.
The training was performed using a constant learning rate of 2e-4 with the AdamW optimizer. The model was trained for 5k steps using an effective batch size of 128. The LoRA configuration employed an α of 16, a dropout rate of 0.05, and a rank r = 8.

## Results
|           Model            |                                        Repo                                         | Sequence Filling (%) | Character Coherence (%) | Visual Closure (%) | Text Closure (%) | Caption Relevance (%) | Total (%) |
| :------------------------: | :---------------------------------------------------------------------------------: | :------------------: | :---------------------: | :----------------: | :--------------: | :-------------------: | :-------: |
|           Random           |                                                                                     |        20.22         |          50.00          |       14.41        |      25.00       |         25.00         |   24.30   |
| Qwen2.5-VL-3B (Zero-Shot)  |  [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)  |        27.48         |          48.95          |       21.33        |      27.41       |         32.82         |   29.61   |
| Qwen2.5-VL-7B (Zero-Shot)  |  [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)  |        30.53         |          54.55          |       22.00        |      37.45       |         40.84         |   34.91   |
| Qwen2.5-VL-72B (Zero-Shot) | [Qwen/Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) |        46.88         |          53.84          |       23.66        |      55.60       |         38.17         |   41.27   |
| Qwen2.5-VL-3B (Lora Fine-Tuned) | [VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP](https://huggingface.co/VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP) | 62.21 | **93.01** | **42.33** | 63.71 | 35.49 | 55.55 |
| Qwen2.5-VL-7B (Lora Fine-Tuned) | [VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP](https://huggingface.co/VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP) | **69.08** | **93.01** | 42.00 | **74.90** | **49.62** | **62.31** |

## Citation

**BibTeX:**
```
@misc{vivoli2025comicspap,
      title={ComicsPAP: understanding comic strips by picking the correct panel}, 
      author={Emanuele Vivoli and Artemis Llabrés and Mohamed Ali Soubgui and Marco Bertini and Ernest Valveny Llobet and Dimosthenis Karatzas},
      year={2025},
      eprint={2503.08561},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.08561}, 
}

@misc{qwen2.5-VL,
    title = {Qwen2.5-VL},
    url = {https://qwenlm.github.io/blog/qwen2.5-vl/},
    author = {Qwen Team},
    month = {January},
    year = {2025}
}
```