zwgao commited on
Commit
8128d01
·
verified ·
1 Parent(s): e4a2d50

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -9
README.md CHANGED
@@ -11,18 +11,14 @@ pipeline_tag: image-feature-extraction
11
  ---
12
 
13
  # Model Card for InternViT-6B-448px-V1-0
14
-
15
- <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/s0wjRQcYFdcQZa2FZ3Om7.webp" alt="Image Description" width="300" height="300">
 
16
 
17
  \[[Paper](https://arxiv.org/abs/2312.14238)\] \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
18
 
19
- | Model | Date | Download | Note |
20
- | ----------------------- | ---------- | ---------------------------------------------------------------------- | -------------------------------- |
21
- | InternViT-6B-448px-V1.5 | 2024.04.20 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) | support dynamic resolution, super strong OCR (🔥new) |
22
- | InternViT-6B-448px-V1.2 | 2024.02.11 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2) | 448 resolution |
23
- | InternViT-6B-448px-V1.0 | 2024.01.30 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0) | 448 resolution |
24
- | InternViT-6B-224px | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-224px) | vision foundation model |
25
- | InternVL-14B-224px | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-14B-224px) | vision-language foundation model |
26
 
27
  ## Model Details
28
  - **Model Type:** vision foundation model, feature backbone
@@ -32,6 +28,25 @@ pipeline_tag: image-feature-extraction
32
  - **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi, OCR-related datasets.
33
  - **Note:** This model has 48 blocks, and we found that using the output after the fourth-to-last block worked best for VLLM. Therefore, when building a VLLM with this model, **please use the features from the fourth-to-last layer.**
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ## Model Usage (Image Embeddings)
36
 
37
  ```python
 
11
  ---
12
 
13
  # Model Card for InternViT-6B-448px-V1-0
14
+ <p align="center">
15
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/s0wjRQcYFdcQZa2FZ3Om7.webp" alt="Image Description" width="300" height="300">
16
+ </p>
17
 
18
  \[[Paper](https://arxiv.org/abs/2312.14238)\] \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
19
 
20
+
21
+ We release InternViT-6B-448px-V1-0, which is integrated into [InternVL-Chat-V1-1](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1). In this update, we explored increasing the resolution to 448x448, enhancing Optical Character Recognition (OCR) capabilities, and improving support for Chinese conversations. For examples of the enhanced capabilities, please refer to the [LINK](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1#examples).
 
 
 
 
 
22
 
23
  ## Model Details
24
  - **Model Type:** vision foundation model, feature backbone
 
28
  - **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi, OCR-related datasets.
29
  - **Note:** This model has 48 blocks, and we found that using the output after the fourth-to-last block worked best for VLLM. Therefore, when building a VLLM with this model, **please use the features from the fourth-to-last layer.**
30
 
31
+ ## Released Models
32
+ ### Vision Foundation model
33
+ | Model | Date | Download | Note |
34
+ | ----------------------- | ---------- | ---------------------------------------------------------------------- | -------------------------------- |
35
+ | InternViT-6B-448px-V1.5 | 2024.04.20 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) | support dynamic resolution, super strong OCR (🔥new) |
36
+ | InternViT-6B-448px-V1.2 | 2024.02.11 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2) | 448 resolution |
37
+ | InternViT-6B-448px-V1.0 | 2024.01.30 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0) | 448 resolution |
38
+ | InternViT-6B-224px | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-224px) | vision foundation model |
39
+ | InternVL-14B-224px | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-14B-224px) | vision-language foundation model |
40
+
41
+ ### Multimodal Large Language Model (MLLM)
42
+ | Model | Date | Download | Note |
43
+ | ----------------------- | ---------- | --------------------------------------------------------------------------- | ---------------------------------- |
44
+ | InternVL-Chat-V1.5 | 2024.04.18 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5) | support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc. (🔥new)|
45
+ | InternVL-Chat-V1.2-Plus | 2024.02.21 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus) | more SFT data and stronger |
46
+ | InternVL-Chat-V1.2 | 2024.02.11 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2) | scaling up LLM to 34B |
47
+ | InternVL-Chat-V1.1 | 2024.01.24 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1) | support Chinese and stronger OCR |
48
+
49
+
50
  ## Model Usage (Image Embeddings)
51
 
52
  ```python