OpenGVLab
/

Mini-InternVL-Chat-4B-V1-5

Image-Text-to-Text

feature-extraction

Model card Files Files and versions Community

opengvlab-admin commited on May 28, 2024

Commit

5e72178

•

1 Parent(s): 95d07f1

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ datasets:
 pipeline_tag: visual-question-answering
 ---
-# Model Card for Mini-InternVL-Chat-2B-V1-5
 <p align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/D60YzQBIzvoCvLRp2gZ0A.jpeg" alt="Image Description" width="300" height="300" />
 </p>
@@ -33,12 +33,12 @@ As shown in the figure below, we adopted the same model architecture as InternVL
 ## Model Details
 - **Model Type:** multimodal large language model (MLLM)
 - **Model Stats:**
-  - Architecture: [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px) + MLP + [InternLM2-Chat-1.8B](https://huggingface.co/internlm/internlm2-chat-1_8b)
   - Image size: dynamic resolution, max to 40 tiles of 448 x 448 (4K resolution).
-  - Params: 2.2B
 - **Training Strategy:**
-  - Learnable component in the pretraining stage: ViT + MLP
   - Learnable component in the finetuning stage: ViT + MLP + LLM
   - For more details on training hyperparameters, take a look at our code: [pretrain]() | [finetune]()
@@ -57,7 +57,7 @@ As shown in the figure below, we adopted the same model architecture as InternVL
 ## Model Usage
-We provide an example code to run Mini-InternVL-Chat-2B-V1.5 using `transformers`.
 You can also use our [online demo](https://internvl.opengvlab.com/) to get a quick experience of this model.

 pipeline_tag: visual-question-answering
 ---
+# Model Card for Mini-InternVL-Chat-4B-V1-5
 <p align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/D60YzQBIzvoCvLRp2gZ0A.jpeg" alt="Image Description" width="300" height="300" />
 </p>
 ## Model Details
 - **Model Type:** multimodal large language model (MLLM)
 - **Model Stats:**
+  - Architecture: [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px) + MLP + [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)
   - Image size: dynamic resolution, max to 40 tiles of 448 x 448 (4K resolution).
+  - Params: 4.2B
 - **Training Strategy:**
+  - Learnable component in the pretraining stage: MLP
   - Learnable component in the finetuning stage: ViT + MLP + LLM
   - For more details on training hyperparameters, take a look at our code: [pretrain]() | [finetune]()
 ## Model Usage
+We provide an example code to run Mini-InternVL-Chat-4B-V1.5 using `transformers`.
 You can also use our [online demo](https://internvl.opengvlab.com/) to get a quick experience of this model.