opengvlab-admin
commited on
Commit
•
5e72178
1
Parent(s):
95d07f1
Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ datasets:
|
|
10 |
pipeline_tag: visual-question-answering
|
11 |
---
|
12 |
|
13 |
-
# Model Card for Mini-InternVL-Chat-
|
14 |
<p align="center">
|
15 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/D60YzQBIzvoCvLRp2gZ0A.jpeg" alt="Image Description" width="300" height="300" />
|
16 |
</p>
|
@@ -33,12 +33,12 @@ As shown in the figure below, we adopted the same model architecture as InternVL
|
|
33 |
## Model Details
|
34 |
- **Model Type:** multimodal large language model (MLLM)
|
35 |
- **Model Stats:**
|
36 |
-
- Architecture: [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px) + MLP + [
|
37 |
- Image size: dynamic resolution, max to 40 tiles of 448 x 448 (4K resolution).
|
38 |
-
- Params:
|
39 |
|
40 |
- **Training Strategy:**
|
41 |
-
- Learnable component in the pretraining stage:
|
42 |
- Learnable component in the finetuning stage: ViT + MLP + LLM
|
43 |
- For more details on training hyperparameters, take a look at our code: [pretrain]() | [finetune]()
|
44 |
|
@@ -57,7 +57,7 @@ As shown in the figure below, we adopted the same model architecture as InternVL
|
|
57 |
|
58 |
## Model Usage
|
59 |
|
60 |
-
We provide an example code to run Mini-InternVL-Chat-
|
61 |
|
62 |
You can also use our [online demo](https://internvl.opengvlab.com/) to get a quick experience of this model.
|
63 |
|
|
|
10 |
pipeline_tag: visual-question-answering
|
11 |
---
|
12 |
|
13 |
+
# Model Card for Mini-InternVL-Chat-4B-V1-5
|
14 |
<p align="center">
|
15 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/D60YzQBIzvoCvLRp2gZ0A.jpeg" alt="Image Description" width="300" height="300" />
|
16 |
</p>
|
|
|
33 |
## Model Details
|
34 |
- **Model Type:** multimodal large language model (MLLM)
|
35 |
- **Model Stats:**
|
36 |
+
- Architecture: [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px) + MLP + [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)
|
37 |
- Image size: dynamic resolution, max to 40 tiles of 448 x 448 (4K resolution).
|
38 |
+
- Params: 4.2B
|
39 |
|
40 |
- **Training Strategy:**
|
41 |
+
- Learnable component in the pretraining stage: MLP
|
42 |
- Learnable component in the finetuning stage: ViT + MLP + LLM
|
43 |
- For more details on training hyperparameters, take a look at our code: [pretrain]() | [finetune]()
|
44 |
|
|
|
57 |
|
58 |
## Model Usage
|
59 |
|
60 |
+
We provide an example code to run Mini-InternVL-Chat-4B-V1.5 using `transformers`.
|
61 |
|
62 |
You can also use our [online demo](https://internvl.opengvlab.com/) to get a quick experience of this model.
|
63 |
|