update readme
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ datasets:
|
|
10 |
|
11 |
## Introduction
|
12 |
|
13 |
-
The Imp project aims to provide a family of
|
14 |
|
15 |
As shown in the Table below, `Imp-v1.5-3B-Phi2` significantly outperforms the counterparts of similar model sizes, and even achieves slightly better performance than the strong LLaVA-7B model on various multimodal benchmarks.
|
16 |
|
@@ -26,7 +26,7 @@ pip install transformers # latest version is ok, but we recommend v4.36.0
|
|
26 |
pip install -q pillow accelerate einops
|
27 |
```
|
28 |
|
29 |
-
You can use the following code for model inference. The format of text instruction is similar to [LLaVA](https://github.com/haotian-liu/LLaVA).
|
30 |
|
31 |
```Python
|
32 |
import torch
|
@@ -69,8 +69,8 @@ We conduct evaluation on 9 commonly-used benchmarks, including 5 academic VQA be
|
|
69 |
| [LaVA-Phi-3B](https://github.com/zhuyiche/llava-phi) | 3B | 71.4 | - | 68.4 | 48.6 | 85.0 | 1335.1 | 59.8 |-|28.9|
|
70 |
| [MobileVLM-3B](https://huggingface.co/mtgv/MobileVLM-3B) | 3B | - | 59.0 | 61.0 | 47.5 | 84.9 | 1288.9 | 59.6 |- |-|
|
71 |
| [MiniCPM-V-3B](https://huggingface.co/mtgv/MobileVLM-3B) | 3B | - |- | - | - | - | 1452.0 | 67.9 | **65.3**|-|
|
72 |
-
| [Bunny-3B](https://huggingface.co/visheratin/MC-LLaVA-3b) | 3B | 79.8 | 62.5 | 70.9 | - | 86.8| 1488.8 | 68.6 |- |-|
|
73 |
-
| **Imp-v1.5-3B-Phi2** | 3B | **81.2** | **63.5** | **72.8**| **59.8** | **88.9**|
|
74 |
|
75 |
## License
|
76 |
This project is licensed under the Apache License 2.0 - see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details.
|
|
|
10 |
|
11 |
## Introduction
|
12 |
|
13 |
+
The Imp project aims to provide a family of strong multimodal lightweight LMMs. Our `Imp-v1.5-3B-Phi2` is a strong MSLM with only **3B** parameters, which is build upon a small yet powerful SLM [Phi-2 ](https://huggingface.co/microsoft/phi-2)(2.7B) and a powerful visual encoder [SigLIP ](https://huggingface.co/google/siglip-so400m-patch14-384)(0.4B), and trained on 1M mixed dataset.
|
14 |
|
15 |
As shown in the Table below, `Imp-v1.5-3B-Phi2` significantly outperforms the counterparts of similar model sizes, and even achieves slightly better performance than the strong LLaVA-7B model on various multimodal benchmarks.
|
16 |
|
|
|
26 |
pip install -q pillow accelerate einops
|
27 |
```
|
28 |
|
29 |
+
You can use the following code for model inference. The format of text instruction is similar to [LLaVA](https://github.com/haotian-liu/LLaVA). Note that the example can only be run on GPUs currently.
|
30 |
|
31 |
```Python
|
32 |
import torch
|
|
|
69 |
| [LaVA-Phi-3B](https://github.com/zhuyiche/llava-phi) | 3B | 71.4 | - | 68.4 | 48.6 | 85.0 | 1335.1 | 59.8 |-|28.9|
|
70 |
| [MobileVLM-3B](https://huggingface.co/mtgv/MobileVLM-3B) | 3B | - | 59.0 | 61.0 | 47.5 | 84.9 | 1288.9 | 59.6 |- |-|
|
71 |
| [MiniCPM-V-3B](https://huggingface.co/mtgv/MobileVLM-3B) | 3B | - |- | - | - | - | 1452.0 | 67.9 | **65.3**|-|
|
72 |
+
| [Bunny-3B](https://huggingface.co/visheratin/MC-LLaVA-3b) | 3B | 79.8 | 62.5 | 70.9 | - | 86.8| **1488.8** | 68.6 |- |-|
|
73 |
+
| **Imp-v1.5-3B-Phi2** | 3B | **81.2** | **63.5** | **72.8**| **59.8** | **88.9**| 1446.4 | **72.9**| 46.7 |**43.3**|
|
74 |
|
75 |
## License
|
76 |
This project is licensed under the Apache License 2.0 - see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details.
|