Add pipeline tag and Github link to model card (#1)

4007063 verified 3 days ago

622 Bytes

	---
	base_model:
	- Qwen/Qwen2-VL-2B-Instruct
	datasets:
	- rp-yu/VPT_Datasets
	language:
	- en
	library_name: transformers
	license: apache-2.0
	metrics:
	- accuracy
	pipeline_tag: image-text-to-text
	---

	# Introducing Visual Perception Token into Multimodal Large Language Model

	This repository contains models based on the paper [Introducing Visual Perception Token into Multimodal Large Language Model](https://arxiv.org/abs/2502.17425). These models utilize Visual Perception Tokens to enhance the visual perception capabilities of multimodal large language models (MLLMs).

	Code: https://github.com/yu-rp/VisualPerceptionToken