--- license: mit pipeline_tag: image-text-to-text library_name: transformers base_model: - internlm/internlm2-chat-1_8b base_model_relation: merge language: - multilingual tags: - internvl - vision - ocr - custom_code - moe --- # Mono-InternVL-2B-S1-3 This repository contains the Mono-InternVL-2B model after **S1.1 concept learning**, **S1.2 semantic learning**, and **S1.3 alignment learning**. Please refer to our [**paper**](https://huggingface.co/papers/2410.08202), [**project page**](https://internvl.github.io/blog/2024-10-10-Mono-InternVL/) and [**GitHub repository**](https://github.com/OpenGVLab/mono-internvl) for introduction and usage. ## Citation If you find this project useful in your research, please consider citing: ```BibTeX @article{luo2024mono, title={Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training}, author={Luo, Gen and Yang, Xue and Dou, Wenhan and Wang, Zhaokai and Liu, Jiawen and Dai, Jifeng and Qiao, Yu and Zhu, Xizhou}, journal={arXiv preprint arXiv:2410.08202}, year={2024} } ```