|
--- |
|
license: mit |
|
pipeline_tag: image-text-to-text |
|
library_name: transformers |
|
base_model: |
|
- internlm/internlm2-chat-1_8b |
|
base_model_relation: merge |
|
language: |
|
- multilingual |
|
tags: |
|
- internvl |
|
- vision |
|
- ocr |
|
- custom_code |
|
- moe |
|
--- |
|
|
|
# Mono-InternVL-2B-S1-3 |
|
|
|
This repository contains the Mono-InternVL-2B model after **S1.1 concept learning**, **S1.2 semantic learning**, and **S1.3 alignment learning**. |
|
|
|
Please refer to our [**paper**](https://huggingface.co/papers/2410.08202), [**project page**](https://internvl.github.io/blog/2024-10-10-Mono-InternVL/) and [**GitHub repository**](https://github.com/OpenGVLab/mono-internvl) for introduction and usage. |
|
|
|
|
|
|
|
## Citation |
|
|
|
If you find this project useful in your research, please consider citing: |
|
|
|
```BibTeX |
|
@article{luo2024mono, |
|
title={Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training}, |
|
author={Luo, Gen and Yang, Xue and Dou, Wenhan and Wang, Zhaokai and Liu, Jiawen and Dai, Jifeng and Qiao, Yu and Zhu, Xizhou}, |
|
journal={arXiv preprint arXiv:2410.08202}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
|