Salesforce
/

xgen-mm-phi3-mini-instruct-r-v1

Image-Text-to-Text

feature-extraction

Model card Files Files and versions Community

SFXX commited on May 8

Commit

2873e87

•

1 Parent(s): 92b6d07

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -11,8 +11,8 @@ pipeline_tag: image-text-to-text
 `BLIP3` is a series of foundational vision-language models (VLMs) developed by Salesforce AI Research. \
 These models have been trained at scale on high-quality image caption datasets and interleaved image-text data. BLIP3 highlights a few features below,
-* The pretrained foundation model, `blip3-phi3-mini-base-r-v1`, achieves state-of-the-art performance under 5b parameters and demonstrates strong in-context learning capabilities.
-* The instruct fine-tuned model, `blip3-phi3-mini-instruct-r-v1`, achieves state-of-the-art performance among open-source and closed-source VLMs under 5b parameters.
 * `blip3-phi3-mini-instruct-r-v1` supports flexible high-resolution image encoding with efficient visual token sampling.
 More technical details will come with a technical report soon.

 `BLIP3` is a series of foundational vision-language models (VLMs) developed by Salesforce AI Research. \
 These models have been trained at scale on high-quality image caption datasets and interleaved image-text data. BLIP3 highlights a few features below,
+* The **pretrained** foundation model, `blip3-phi3-mini-base-r-v1`, achieves state-of-the-art performance under 5b parameters and demonstrates strong in-context learning capabilities.
+* The **instruct** fine-tuned model, `blip3-phi3-mini-instruct-r-v1`, achieves state-of-the-art performance among open-source and closed-source VLMs under 5b parameters.
 * `blip3-phi3-mini-instruct-r-v1` supports flexible high-resolution image encoding with efficient visual token sampling.
 More technical details will come with a technical report soon.