rp-yu's picture
Add pipeline tag and Github link to model card (#1)
4007063 verified
metadata
base_model:
  - Qwen/Qwen2-VL-2B-Instruct
datasets:
  - rp-yu/VPT_Datasets
language:
  - en
library_name: transformers
license: apache-2.0
metrics:
  - accuracy
pipeline_tag: image-text-to-text

Introducing Visual Perception Token into Multimodal Large Language Model

This repository contains models based on the paper Introducing Visual Perception Token into Multimodal Large Language Model. These models utilize Visual Perception Tokens to enhance the visual perception capabilities of multimodal large language models (MLLMs).

Code: https://github.com/yu-rp/VisualPerceptionToken