File size: 3,191 Bytes
a2efbb0 dd930ba a2efbb0 dd930ba a2efbb0 dd930ba a2efbb0 dd930ba a2efbb0 dd930ba a2efbb0 dd930ba |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
library_name: transformers
tags:
- comics
license: cc-by-sa-4.0
datasets:
- VLR-CVC/ComicsPAP
language:
- en
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
---
# Lora Fine-Tune of Qwen2.5-VL-3B-Instruct on ComicsPAP datataset
[Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) fine-tunined simultaneously in all five tasks of the [ComicsPAP](https://huggingface.co/datasets/VLR-CVC/ComicsPAP) dataset.
The training was performed using a constant learning rate of 2e-4 with the AdamW optimizer. The model was trained for 5k steps using an effective batch size of 128. The LoRA configuration employed an α of 16, a dropout rate of 0.05, and a rank r = 8.
## Results
| Model | Repo | Sequence Filling (%) | Character Coherence (%) | Visual Closure (%) | Text Closure (%) | Caption Relevance (%) | Total (%) |
| :------------------------: | :---------------------------------------------------------------------------------: | :------------------: | :---------------------: | :----------------: | :--------------: | :-------------------: | :-------: |
| Random | | 20.22 | 50.00 | 14.41 | 25.00 | 25.00 | 24.30 |
| Qwen2.5-VL-3B (Zero-Shot) | [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) | 27.48 | 48.95 | 21.33 | 27.41 | 32.82 | 29.61 |
| Qwen2.5-VL-7B (Zero-Shot) | [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | 30.53 | 54.55 | 22.00 | 37.45 | 40.84 | 34.91 |
| Qwen2.5-VL-72B (Zero-Shot) | [Qwen/Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) | 46.88 | 53.84 | 23.66 | 55.60 | 38.17 | 41.27 |
| Qwen2.5-VL-3B (Lora Fine-Tuned) | [VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP](https://huggingface.co/VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP) | 62.21 | **93.01** | **42.33** | 63.71 | 35.49 | 55.55 |
| Qwen2.5-VL-7B (Lora Fine-Tuned) | [VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP](https://huggingface.co/VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP) | **69.08** | **93.01** | 42.00 | **74.90** | **49.62** | **62.31** |
## Citation
**BibTeX:**
```
@misc{vivoli2025comicspap,
title={ComicsPAP: understanding comic strips by picking the correct panel},
author={Emanuele Vivoli and Artemis Llabrés and Mohamed Ali Soubgui and Marco Bertini and Ernest Valveny Llobet and Dimosthenis Karatzas},
year={2025},
eprint={2503.08561},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.08561},
}
@misc{qwen2.5-VL,
title = {Qwen2.5-VL},
url = {https://qwenlm.github.io/blog/qwen2.5-vl/},
author = {Qwen Team},
month = {January},
year = {2025}
}
``` |