File size: 641 Bytes
55435ed f0aaa5f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
---
license: llama2
language:
- en
- zh
tags:
- multimodal
datasets:
- liuhaotian/LLaVA-Pretrain
base_model:
- lmsys/vicuna-7b-v1.5
pipeline_tag: image-text-to-text
library_name: transformers
---
## **Citation**
If you find this model useful, please cite the following paper
```
@article{huang2024deciphering,
title={Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate},
author={Huang, Qidong and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Cao, Yuhang and Wang, Jiaqi and Lin, Dahua and Zhang, Weiming and Yu, Nenghai},
journal={arXiv preprint arXiv:2410.07167},
year={2024}
}
``` |