README.md · Jl-wei/app-intro-img-classifier at e260d65aaa20a7ff811f0ec87ca19e1f9278ae42

metadata

base_model: google/vit-base-patch16-224-in21k
tags:
  - image-classification

This model is trained to classify app introduction images into three categories: Surrounded Screenshot, Screenshot, and Irrelevant.

Code and dataset can be found at https://github.com/Jl-wei/guing

Using with pipeline

from PIL import Image
from transformers import pipeline

classifier = pipeline("image-classification", model="Jl-wei/app-intro-img-classifier", device=0)
image = Image.open(img_path)
result = classifier(image)

This is the app introduction image classifier of the following paper:

@misc{wei2024guing,
      title={GUing: A Mobile GUI Search Engine using a Vision-Language Model}, 
      author={Jialiang Wei and Anne-Lise Courbis and Thomas Lambolais and Binbin Xu and Pierre Louis Bernard and Gérard Dray and Walid Maalej},
      year={2024},
      eprint={2405.00145},
      archivePrefix={arXiv}
}

Please note that the model can only be used for academic purpose.