update Citation
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ datasets:
|
|
10 |
|
11 |
## Introduction
|
12 |
|
13 |
-
The Imp project aims to provide a family of
|
14 |
|
15 |
As shown in the Table below, `Imp-v1.5-3B-Phi2` significantly outperforms the counterparts of similar model sizes, and even achieves slightly better performance than the strong LLaVA-7B model on various multimodal benchmarks.
|
16 |
|
@@ -83,7 +83,7 @@ If you use our model or refer our work in your studies, please cite:
|
|
83 |
```bibtex
|
84 |
@article{imp2024,
|
85 |
title={Imp: Highly Capable Large Multimodal Models for Mobile Devices},
|
86 |
-
author={Shao, Zhenwei and Yu, Zhou and Yu, Jun and Ouyang, Xuecheng and
|
87 |
journal={arXiv preprint arXiv:2405.12107},
|
88 |
year={2024}
|
89 |
}
|
|
|
10 |
|
11 |
## Introduction
|
12 |
|
13 |
+
The Imp project aims to provide a family of highly capable yet lightweight LMMs. Our `Imp-v1.5-3B-Phi2` is a strong MSLM with only **3B** parameters, which is build upon a small yet powerful SLM [Phi-2 ](https://huggingface.co/microsoft/phi-2)(2.7B) and a powerful visual encoder [SigLIP ](https://huggingface.co/google/siglip-so400m-patch14-384)(0.4B), and trained on 1M mixed dataset.
|
14 |
|
15 |
As shown in the Table below, `Imp-v1.5-3B-Phi2` significantly outperforms the counterparts of similar model sizes, and even achieves slightly better performance than the strong LLaVA-7B model on various multimodal benchmarks.
|
16 |
|
|
|
83 |
```bibtex
|
84 |
@article{imp2024,
|
85 |
title={Imp: Highly Capable Large Multimodal Models for Mobile Devices},
|
86 |
+
author={Shao, Zhenwei and Yu, Zhou and Yu, Jun and Ouyang, Xuecheng and Zheng, Lihao and Gai, Zhenbiao and Wang, Mingyang and Ding, Jiajun},
|
87 |
journal={arXiv preprint arXiv:2405.12107},
|
88 |
year={2024}
|
89 |
}
|