Text Generation
Transformers
Safetensors
imp
custom_code
Oyoy1235 commited on
Commit
5f3a7bd
1 Parent(s): 8526fbb

update readme

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -10,7 +10,7 @@ datasets:
10
 
11
  ## Introduction
12
 
13
- The Imp project aims to provide a family of a strong multimodal `small` language models (MSLMs). Our `Imp-v1.5-3B-Phi2` is a strong MSLM with only **3B** parameters, which is build upon a small yet powerful SLM [Phi-2 ](https://huggingface.co/microsoft/phi-2)(2.7B) and a powerful visual encoder [SigLIP ](https://huggingface.co/google/siglip-so400m-patch14-384)(0.4B), and trained on 1M mixed dataset.
14
 
15
  As shown in the Table below, `Imp-v1.5-3B-Phi2` significantly outperforms the counterparts of similar model sizes, and even achieves slightly better performance than the strong LLaVA-7B model on various multimodal benchmarks.
16
 
@@ -26,7 +26,7 @@ pip install transformers # latest version is ok, but we recommend v4.36.0
26
  pip install -q pillow accelerate einops
27
  ```
28
 
29
- You can use the following code for model inference. The format of text instruction is similar to [LLaVA](https://github.com/haotian-liu/LLaVA). A Colab page to run this example is provided [here](https://colab.research.google.com/drive/1EBYky6xIPjnlPppo2gZaiNK6gEsjXgom?usp=drive_link#scrollTo=2-VpU6QzWCVZ). Note that the example can only be run on GPUs currently.
30
 
31
  ```Python
32
  import torch
@@ -69,8 +69,8 @@ We conduct evaluation on 9 commonly-used benchmarks, including 5 academic VQA be
69
  | [LaVA-Phi-3B](https://github.com/zhuyiche/llava-phi) | 3B | 71.4 | - | 68.4 | 48.6 | 85.0 | 1335.1 | 59.8 |-|28.9|
70
  | [MobileVLM-3B](https://huggingface.co/mtgv/MobileVLM-3B) | 3B | - | 59.0 | 61.0 | 47.5 | 84.9 | 1288.9 | 59.6 |- |-|
71
  | [MiniCPM-V-3B](https://huggingface.co/mtgv/MobileVLM-3B) | 3B | - |- | - | - | - | 1452.0 | 67.9 | **65.3**|-|
72
- | [Bunny-3B](https://huggingface.co/visheratin/MC-LLaVA-3b) | 3B | 79.8 | 62.5 | 70.9 | - | 86.8| 1488.8 | 68.6 |- |-|
73
- | **Imp-v1.5-3B-Phi2** | 3B | **81.2** | **63.5** | **72.8**| **59.8** | **88.9**| **1446.4** | **72.9**| 46.7 |**43.3**|
74
 
75
  ## License
76
  This project is licensed under the Apache License 2.0 - see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details.
 
10
 
11
  ## Introduction
12
 
13
+ The Imp project aims to provide a family of strong multimodal lightweight LMMs. Our `Imp-v1.5-3B-Phi2` is a strong MSLM with only **3B** parameters, which is build upon a small yet powerful SLM [Phi-2 ](https://huggingface.co/microsoft/phi-2)(2.7B) and a powerful visual encoder [SigLIP ](https://huggingface.co/google/siglip-so400m-patch14-384)(0.4B), and trained on 1M mixed dataset.
14
 
15
  As shown in the Table below, `Imp-v1.5-3B-Phi2` significantly outperforms the counterparts of similar model sizes, and even achieves slightly better performance than the strong LLaVA-7B model on various multimodal benchmarks.
16
 
 
26
  pip install -q pillow accelerate einops
27
  ```
28
 
29
+ You can use the following code for model inference. The format of text instruction is similar to [LLaVA](https://github.com/haotian-liu/LLaVA). Note that the example can only be run on GPUs currently.
30
 
31
  ```Python
32
  import torch
 
69
  | [LaVA-Phi-3B](https://github.com/zhuyiche/llava-phi) | 3B | 71.4 | - | 68.4 | 48.6 | 85.0 | 1335.1 | 59.8 |-|28.9|
70
  | [MobileVLM-3B](https://huggingface.co/mtgv/MobileVLM-3B) | 3B | - | 59.0 | 61.0 | 47.5 | 84.9 | 1288.9 | 59.6 |- |-|
71
  | [MiniCPM-V-3B](https://huggingface.co/mtgv/MobileVLM-3B) | 3B | - |- | - | - | - | 1452.0 | 67.9 | **65.3**|-|
72
+ | [Bunny-3B](https://huggingface.co/visheratin/MC-LLaVA-3b) | 3B | 79.8 | 62.5 | 70.9 | - | 86.8| **1488.8** | 68.6 |- |-|
73
+ | **Imp-v1.5-3B-Phi2** | 3B | **81.2** | **63.5** | **72.8**| **59.8** | **88.9**| 1446.4 | **72.9**| 46.7 |**43.3**|
74
 
75
  ## License
76
  This project is licensed under the Apache License 2.0 - see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details.