shikiw
/

LLaVA-v1.5-MoCa-7B-pretrain

Image-Text-to-Text

text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

LLaVA-v1.5-MoCa-7B-pretrain / README.md

shikiw's picture

Update README.md

f0aaa5f verified 2 months ago

|

history blame contribute delete

641 Bytes

	---
	license: llama2
	language:
	- en
	- zh
	tags:
	- multimodal
	datasets:
	- liuhaotian/LLaVA-Pretrain
	base_model:
	- lmsys/vicuna-7b-v1.5
	pipeline_tag: image-text-to-text
	library_name: transformers
	---


	## Citation
	If you find this model useful, please cite the following paper
	```
	@article{huang2024deciphering,
	title={Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate},
	author={Huang, Qidong and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Cao, Yuhang and Wang, Jiaqi and Lin, Dahua and Zhang, Weiming and Yu, Nenghai},
	journal={arXiv preprint arXiv:2410.07167},
	year={2024}
	}
	```