update model card
Browse files
README.md
CHANGED
@@ -8,11 +8,11 @@ pipeline_tag: image-text-to-text
|
|
8 |
|
9 |
# Model description
|
10 |
|
11 |
-
`
|
12 |
-
These models have been trained at scale on high-quality image caption datasets and interleaved image-text data.
|
13 |
|
14 |
-
* The **pretrained** foundation model, `
|
15 |
-
* The **instruct** fine-tuned model, `
|
16 |
* `blip3-phi3-mini-instruct-r-v1` supports flexible high-resolution image encoding with efficient visual token sampling.
|
17 |
|
18 |
More technical details will come with a technical report soon.
|
@@ -35,7 +35,7 @@ More technical details will come with a technical report soon.
|
|
35 |
| MM1-3B | 0 | 73.5 | 55.6 | 63.3 | 26.1 | 29.4 | 15.6 | 46.2 |
|
36 |
| | 4 | 112.3 | 99.7 | 84.1 | 48.6 | 45.3 | 38.0 | 57.9 |
|
37 |
| | 8 | 114.6 | 104.7 | 88.8 | 48.4 | 44.6 | 46.4 | 63.6 |
|
38 |
-
| **
|
39 |
| | 4 | 110.5 | **101.7** | **84.6** | **49.2** | **46.1** | **38.4** | **63.9** |
|
40 |
| | 8 | 112.1 | 104.4 | 87.7 | **49.1** | **46.4** | 44.3 | **63.8** |
|
41 |
|
@@ -46,7 +46,7 @@ More technical details will come with a technical report soon.
|
|
46 |
| openbmb/MiniCPM-V-2 | 67.1 | 69.6 | 1808 | - | - | - | 38.2 | - | 38.7 | - | - | - | |
|
47 |
| VILA1.5-3B | 67.9 | 63.4 | - | 1442 | - | - | 33.3 | 35.4 | - | 69.0 | 85.9 | - | |
|
48 |
| xtuner/llava-phi-3-mini-hf | 70.0 | 69.2 | 1790 | 1477 | 313 | 43.7 | **41.4** | - | - | 73.7 | 87.3 | 69.3 | |
|
49 |
-
| **
|
50 |
|
51 |
|
52 |
# How to use
|
@@ -130,9 +130,9 @@ Our code and weights are released under the Creative Commons Attribution Non Com
|
|
130 |
|
131 |
# Citation
|
132 |
```
|
133 |
-
@misc{
|
134 |
-
title={
|
135 |
-
url={https://huggingface.co/Salesforce/
|
136 |
author={Salesforce AI Research},
|
137 |
month={May},
|
138 |
year={2024}
|
|
|
8 |
|
9 |
# Model description
|
10 |
|
11 |
+
`XGen-MM` is a series of foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research. This series advances upon the successful designs of the `BLIP` series, incorporating fundamental enhancements that ensure a more robust and superior foundation. \
|
12 |
+
These models have been trained at scale on high-quality image caption datasets and interleaved image-text data. XGen-MM highlights a few features below,
|
13 |
|
14 |
+
* The **pretrained** foundation model, `xgen-mm-phi3-mini-base-r-v1`, achieves state-of-the-art performance under 5b parameters and demonstrates strong in-context learning capabilities.
|
15 |
+
* The **instruct** fine-tuned model, `xgen-mm-phi3-mini-instruct-r-v1`, achieves state-of-the-art performance among open-source and closed-source VLMs under 5b parameters.
|
16 |
* `blip3-phi3-mini-instruct-r-v1` supports flexible high-resolution image encoding with efficient visual token sampling.
|
17 |
|
18 |
More technical details will come with a technical report soon.
|
|
|
35 |
| MM1-3B | 0 | 73.5 | 55.6 | 63.3 | 26.1 | 29.4 | 15.6 | 46.2 |
|
36 |
| | 4 | 112.3 | 99.7 | 84.1 | 48.6 | 45.3 | 38.0 | 57.9 |
|
37 |
| | 8 | 114.6 | 104.7 | 88.8 | 48.4 | 44.6 | 46.4 | 63.6 |
|
38 |
+
| **xgen-mm-phi3-mini-base-r-v1 (Ours)**| 0 | **81.7** | **80.2** | 60.7 | **26.5** | **36.0** | **21.2** | **48.1** |
|
39 |
| | 4 | 110.5 | **101.7** | **84.6** | **49.2** | **46.1** | **38.4** | **63.9** |
|
40 |
| | 8 | 112.1 | 104.4 | 87.7 | **49.1** | **46.4** | 44.3 | **63.8** |
|
41 |
|
|
|
46 |
| openbmb/MiniCPM-V-2 | 67.1 | 69.6 | 1808 | - | - | - | 38.2 | - | 38.7 | - | - | - | |
|
47 |
| VILA1.5-3B | 67.9 | 63.4 | - | 1442 | - | - | 33.3 | 35.4 | - | 69.0 | 85.9 | - | |
|
48 |
| xtuner/llava-phi-3-mini-hf | 70.0 | 69.2 | 1790 | 1477 | 313 | 43.7 | **41.4** | - | - | 73.7 | 87.3 | 69.3 | |
|
49 |
+
| **xgen-mm-phi3-mini-instruct-r-v1 (Ours)** | **72.1** | 74.1 | **1827** | 1467 | **360** | **44.6** | 39.8 | **45.1** | **39.3** | **74.2** | 87.2 | **75.8** | |
|
50 |
|
51 |
|
52 |
# How to use
|
|
|
130 |
|
131 |
# Citation
|
132 |
```
|
133 |
+
@misc{xgen_mm_phi3_mini,
|
134 |
+
title={xgen-mm-phi3-mini-instruct Model Card},
|
135 |
+
url={https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-r-v1},
|
136 |
author={Salesforce AI Research},
|
137 |
month={May},
|
138 |
year={2024}
|