Transformers
Safetensors
English
comics
Llabres's picture
Update README.md
dd930ba verified
metadata
library_name: transformers
tags:
  - comics
license: cc-by-sa-4.0
datasets:
  - VLR-CVC/ComicsPAP
language:
  - en
base_model:
  - Qwen/Qwen2.5-VL-7B-Instruct

Lora Fine-Tune of Qwen2.5-VL-3B-Instruct on ComicsPAP datataset

Qwen2.5-VL-7B-Instruct fine-tunined simultaneously in all five tasks of the ComicsPAP dataset. The training was performed using a constant learning rate of 2e-4 with the AdamW optimizer. The model was trained for 5k steps using an effective batch size of 128. The LoRA configuration employed an α of 16, a dropout rate of 0.05, and a rank r = 8.

Results

Model Repo Sequence Filling (%) Character Coherence (%) Visual Closure (%) Text Closure (%) Caption Relevance (%) Total (%)
Random 20.22 50.00 14.41 25.00 25.00 24.30
Qwen2.5-VL-3B (Zero-Shot) Qwen/Qwen2.5-VL-3B-Instruct 27.48 48.95 21.33 27.41 32.82 29.61
Qwen2.5-VL-7B (Zero-Shot) Qwen/Qwen2.5-VL-7B-Instruct 30.53 54.55 22.00 37.45 40.84 34.91
Qwen2.5-VL-72B (Zero-Shot) Qwen/Qwen2.5-VL-72B-Instruct 46.88 53.84 23.66 55.60 38.17 41.27
Qwen2.5-VL-3B (Lora Fine-Tuned) VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP 62.21 93.01 42.33 63.71 35.49 55.55
Qwen2.5-VL-7B (Lora Fine-Tuned) VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP 69.08 93.01 42.00 74.90 49.62 62.31

Citation

BibTeX:

@misc{vivoli2025comicspap,
      title={ComicsPAP: understanding comic strips by picking the correct panel}, 
      author={Emanuele Vivoli and Artemis Llabrés and Mohamed Ali Soubgui and Marco Bertini and Ernest Valveny Llobet and Dimosthenis Karatzas},
      year={2025},
      eprint={2503.08561},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.08561}, 
}

@misc{qwen2.5-VL,
    title = {Qwen2.5-VL},
    url = {https://qwenlm.github.io/blog/qwen2.5-vl/},
    author = {Qwen Team},
    month = {January},
    year = {2025}
}