---
license: mit
pipeline_tag: image-text-to-text
library_name: transformers
base_model:
  - internlm/internlm2-chat-1_8b
base_model_relation: merge
language:
  - multilingual
tags:
  - internvl
  - vision
  - ocr
  - custom_code
  - moe
---

# Mono-InternVL-2B-S1-3

This repository contains the Mono-InternVL-2B model after **S1.1 concept learning**, **S1.2 semantic learning**, and **S1.3 alignment learning**.

Please refer to our [**paper**](https://huggingface.co/papers/2410.08202), [**project page**](https://internvl.github.io/blog/2024-10-10-Mono-InternVL/) and [**GitHub repository**](https://github.com/OpenGVLab/mono-internvl) for introduction and usage.


## Citation

If you find this project useful in your research, please consider citing:

```BibTeX
@article{luo2024mono,
  title={Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training},
  author={Luo, Gen and Yang, Xue and Dou, Wenhan and Wang, Zhaokai and Liu, Jiawen and Dai, Jifeng and Qiao, Yu and Zhu, Xizhou},
  journal={arXiv preprint arXiv:2410.08202},
  year={2024}
}
```