shikiw
/

LLaVA-v1.5-MoCa-7B-pretrain

Image-Text-to-Text

text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

shikiw commited on Oct 28

Commit

f0aaa5f

•

1 Parent(s): 55435ed

Update README.md

Files changed (1) hide show

README.md +21 -1

README.md CHANGED Viewed

@@ -3,4 +3,24 @@ license: llama2
 language:
 - en
 - zh
----

 language:
 - en
 - zh
+tags:
+- multimodal
+datasets:
+- liuhaotian/LLaVA-Pretrain
+base_model:
+- lmsys/vicuna-7b-v1.5
+pipeline_tag: image-text-to-text
+library_name: transformers
+---
+## **Citation**
+If you find this model useful, please cite the following paper
+```
+@article{huang2024deciphering,
+  title={Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate},
+  author={Huang, Qidong and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Cao, Yuhang and Wang, Jiaqi and Lin, Dahua and Zhang, Weiming and Yu, Nenghai},
+  journal={arXiv preprint arXiv:2410.07167},
+  year={2024}
+}
+```