Update README.md
Browse files
README.md
CHANGED
@@ -10,10 +10,7 @@ datasets:
|
|
10 |
pipeline_tag: visual-question-answering
|
11 |
---
|
12 |
|
13 |
-
#
|
14 |
-
<p align="center">
|
15 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/X8AXMkOlKeUpNcoJIXKna.webp" alt="Image Description" width="300" height="300">
|
16 |
-
</p>
|
17 |
|
18 |
[\[π Blog\]](https://internvl.github.io/blog/) [\[π InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[π InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[π¨οΈ Chat Demo\]](https://internvl.opengvlab.com/)
|
19 |
|
@@ -40,18 +37,6 @@ InternVL-Chat-V1-2-Plus uses the same model architecture as [InternVL-Chat-V1-2]
|
|
40 |
- Learnable Component: ViT + MLP + LLM
|
41 |
- Data: 12 million SFT samples.
|
42 |
|
43 |
-
## Released Models
|
44 |
-
|
45 |
-
| Model | Vision Foundation Model | Release Date |Note |
|
46 |
-
| :---------------------------------------------------------:|:--------------------------------------------------------------------------: |:----------------------:| :---------------------------------- |
|
47 |
-
| InternVL-Chat-V1-5(π€ [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)) | InternViT-6B-448px-V1-5(π€ [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5)) |2024.04.18 | support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc. (π₯new)|
|
48 |
-
| InternVL-Chat-V1-2-Plus(π€ [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus) ) |InternViT-6B-448px-V1-2(π€ [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2)) |2024.02.21 | more SFT data and stronger |
|
49 |
-
| InternVL-Chat-V1-2(π€ [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2) ) |InternViT-6B-448px-V1-2(π€ [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2)) |2024.02.11 | scaling up LLM to 34B |
|
50 |
-
| InternVL-Chat-V1-1(π€ [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1)) |InternViT-6B-448px-V1-0(π€ [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0)) |2024.01.24 | support Chinese and stronger OCR |
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
## Performance
|
56 |
|
57 |
\* Proprietary Model β Training Set Observed
|
@@ -153,9 +138,3 @@ If you find this project useful in your research, please consider citing:
|
|
153 |
## License
|
154 |
|
155 |
This project is released under the MIT license. Parts of this project contain code and models (e.g., LLaMA2) from other sources, which are subject to their respective licenses.
|
156 |
-
|
157 |
-
Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.
|
158 |
-
|
159 |
-
## Acknowledgement
|
160 |
-
|
161 |
-
InternVL is built with reference to the code of the following projects: [OpenAI CLIP](https://github.com/openai/CLIP), [Open CLIP](https://github.com/mlfoundations/open_clip), [CLIP Benchmark](https://github.com/LAION-AI/CLIP_benchmark), [EVA](https://github.com/baaivision/EVA/tree/master), [InternImage](https://github.com/OpenGVLab/InternImage), [ViT-Adapter](https://github.com/czczup/ViT-Adapter), [MMSegmentation](https://github.com/open-mmlab/mmsegmentation), [Transformers](https://github.com/huggingface/transformers), [DINOv2](https://github.com/facebookresearch/dinov2), [BLIP-2](https://github.com/salesforce/LAVIS/tree/main/projects/blip2), [Qwen-VL](https://github.com/QwenLM/Qwen-VL/tree/master/eval_mm), and [LLaVA-1.5](https://github.com/haotian-liu/LLaVA). Thanks for their awesome work!
|
|
|
10 |
pipeline_tag: visual-question-answering
|
11 |
---
|
12 |
|
13 |
+
# InternVL-Chat-V1-2-Plus
|
|
|
|
|
|
|
14 |
|
15 |
[\[π Blog\]](https://internvl.github.io/blog/) [\[π InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[π InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[π¨οΈ Chat Demo\]](https://internvl.opengvlab.com/)
|
16 |
|
|
|
37 |
- Learnable Component: ViT + MLP + LLM
|
38 |
- Data: 12 million SFT samples.
|
39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
## Performance
|
41 |
|
42 |
\* Proprietary Model β Training Set Observed
|
|
|
138 |
## License
|
139 |
|
140 |
This project is released under the MIT license. Parts of this project contain code and models (e.g., LLaMA2) from other sources, which are subject to their respective licenses.
|
|
|
|
|
|
|
|
|
|
|
|