update latest MM1 results, model path fix
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ pipeline_tag: image-text-to-text
|
|
9 |
# Model description
|
10 |
We are excited to announce the continuation and rebranding of our **BLIP series** into **XGen-MM**, aligning with Salesforce's unified XGen initiative for large foundation models! This rebranding marks a significant step in our ongoing development of cutting-edge multimodal technologies.
|
11 |
|
12 |
-
|
13 |
These models have been trained at scale on high-quality image caption datasets and interleaved image-text data. XGen-MM highlights a few features below,
|
14 |
|
15 |
* The **pretrained** foundation model, `xgen-mm-phi3-mini-base-r-v1`, achieves state-of-the-art performance under 5b parameters and demonstrates strong in-context learning capabilities.
|
@@ -43,11 +43,11 @@ More technical details will come with a technical report soon.
|
|
43 |
### Instruct (after instruction tuning)
|
44 |
| Model | SEED-IMG | MMBench(dev) | MME-total | MME-P | MME-C | MMStar | MMMU (val) | MMVet | MathVista (mini) | ScienceQA (test) | POPE | AI2D | |
|
45 |
|----------------------------|----------|--------------|-----------|----------|---------|----------|------------|----------|------------------|------------------|----------|----------|---|
|
46 |
-
| MM1-3B-Chat | 68.8 |
|
47 |
| openbmb/MiniCPM-V-2 | 67.1 | 69.6 | 1808 | - | - | - | 38.2 | - | 38.7 | - | - | - | |
|
48 |
| VILA1.5-3B | 67.9 | 63.4 | - | 1442 | - | - | 33.3 | 35.4 | - | 69.0 | 85.9 | - | |
|
49 |
| xtuner/llava-phi-3-mini-hf | 70.0 | 69.2 | 1790 | 1477 | 313 | 43.7 | **41.4** | - | - | 73.7 | 87.3 | 69.3 | |
|
50 |
-
| **xgen-mm-phi3-mini-instruct-r-v1 (Ours)** | **72.1** | 74.1 | **1827** | 1467 | **360** | **44.6** | 39.8 | **45.1** | **39.3** | **74.2** | 87.2 | **75.8** | |
|
51 |
|
52 |
|
53 |
# How to use
|
@@ -77,7 +77,7 @@ class EosListStoppingCriteria(StoppingCriteria):
|
|
77 |
return self.eos_sequence in last_ids
|
78 |
|
79 |
# load models
|
80 |
-
model_name_or_path = "Salesforce/
|
81 |
model = AutoModelForVision2Seq.from_pretrained(model_name_or_path, trust_remote_code=True)
|
82 |
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True, use_fast=False, legacy=False)
|
83 |
image_processor = AutoImageProcessor.from_pretrained(model_name_or_path, trust_remote_code=True)
|
|
|
9 |
# Model description
|
10 |
We are excited to announce the continuation and rebranding of our **BLIP series** into **XGen-MM**, aligning with Salesforce's unified XGen initiative for large foundation models! This rebranding marks a significant step in our ongoing development of cutting-edge multimodal technologies.
|
11 |
|
12 |
+
`XGen-MM` is a series of the latest foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research. This series advances upon the successful designs of the `BLIP` series, incorporating fundamental enhancements that ensure a more robust and superior foundation. \
|
13 |
These models have been trained at scale on high-quality image caption datasets and interleaved image-text data. XGen-MM highlights a few features below,
|
14 |
|
15 |
* The **pretrained** foundation model, `xgen-mm-phi3-mini-base-r-v1`, achieves state-of-the-art performance under 5b parameters and demonstrates strong in-context learning capabilities.
|
|
|
43 |
### Instruct (after instruction tuning)
|
44 |
| Model | SEED-IMG | MMBench(dev) | MME-total | MME-P | MME-C | MMStar | MMMU (val) | MMVet | MathVista (mini) | ScienceQA (test) | POPE | AI2D | |
|
45 |
|----------------------------|----------|--------------|-----------|----------|---------|----------|------------|----------|------------------|------------------|----------|----------|---|
|
46 |
+
| MM1-3B-Chat | 68.8 | 67.8 | 1761 | **1482** | 279 | - | 33.9 | 43.7 | - | - | **87.4** | - | |
|
47 |
| openbmb/MiniCPM-V-2 | 67.1 | 69.6 | 1808 | - | - | - | 38.2 | - | 38.7 | - | - | - | |
|
48 |
| VILA1.5-3B | 67.9 | 63.4 | - | 1442 | - | - | 33.3 | 35.4 | - | 69.0 | 85.9 | - | |
|
49 |
| xtuner/llava-phi-3-mini-hf | 70.0 | 69.2 | 1790 | 1477 | 313 | 43.7 | **41.4** | - | - | 73.7 | 87.3 | 69.3 | |
|
50 |
+
| **xgen-mm-phi3-mini-instruct-r-v1 (Ours)** | **72.1** | **74.1** | **1827** | 1467 | **360** | **44.6** | 39.8 | **45.1** | **39.3** | **74.2** | 87.2 | **75.8** | |
|
51 |
|
52 |
|
53 |
# How to use
|
|
|
77 |
return self.eos_sequence in last_ids
|
78 |
|
79 |
# load models
|
80 |
+
model_name_or_path = "Salesforce/xgen-mm-phi3-mini-instruct-r-v1"
|
81 |
model = AutoModelForVision2Seq.from_pretrained(model_name_or_path, trust_remote_code=True)
|
82 |
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True, use_fast=False, legacy=False)
|
83 |
image_processor = AutoImageProcessor.from_pretrained(model_name_or_path, trust_remote_code=True)
|