czczup commited on
Commit
1d37d35
·
1 Parent(s): 54b948b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -15,8 +15,11 @@ This repository contains the PyTorch version of the InternVL model weights.
15
 
16
  InternVL scales up the ViT to _**6B parameters**_ and aligns it with LLM.
17
 
 
 
18
  It is _**the largest open-source vision/vision-language foundation model (14B)**_ to date, achieving _**32 state-of-the-art**_ performances on a wide range of tasks such as visual perception, cross-modal retrieval, multimodal dialogue, etc.
19
 
 
20
  # Pretrained Weights
21
 
22
  | model name | type | download | size |
@@ -24,13 +27,13 @@ It is _**the largest open-source vision/vision-language foundation model (14B)**
24
  | InternViT-6B-224px | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/intern_vit_6b_224px.pth) | 12 GB |
25
  | InternVL-C-13B-224px | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/internvl_c_13b_224px.pth) | 25.4 GB |
26
 
27
- # Linear-Probe Image Classification
28
 
29
  | model name | IN-1K | IN-ReaL | IN-V2 | IN-A | IN-R | IN-Sketch | download |
30
  | ------------------ | :---: | :-----: | :---: | :--: | :--: | :-------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
31
  | InternViT-6B-224px | 88.2 | 90.4 | 79.9 | 77.5 | 89.8 | 69.1 | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px_head.pth) \| [log](https://github.com/OpenGVLab/InternVL/blob/main/classification/work_dirs/intern_vit_6b_1k_224/log_rank0.txt) |
32
 
33
- # Semantic Segmentation
34
 
35
  | type | backbone | head | mIoU | config | download |
36
  | --------------- | --------------------- | :-----: | :--: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
@@ -43,7 +46,6 @@ It is _**the largest open-source vision/vision-language foundation model (14B)**
43
  | head tuning | InternViT-6B (frozen) | UperNet | 54.9 | [config](https://github.com/OpenGVLab/InternVL/blob/main/segmentation//configs/intern_vit_6b/head_tuning/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.py) | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.log) |
44
  | full tuning | InternViT-6B | UperNet | 58.9 | [config](https://github.com/OpenGVLab/InternVL/blob/main/segmentation//configs/intern_vit_6b/full_tuning/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.py) | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.log) |
45
 
46
-
47
  # License
48
  This project is released under the MIT license. Parts of this project contain code and models from other sources, which are subject to their respective licenses.
49
 
 
15
 
16
  InternVL scales up the ViT to _**6B parameters**_ and aligns it with LLM.
17
 
18
+ It is trained using web-scale, noisy image-text pairs. The data are all publicly available and comprise multilingual content, including LAION-en, LAION-multi, LAION-COCO, COYO, Wukong, CC12M, CC3M, and SBU.
19
+
20
  It is _**the largest open-source vision/vision-language foundation model (14B)**_ to date, achieving _**32 state-of-the-art**_ performances on a wide range of tasks such as visual perception, cross-modal retrieval, multimodal dialogue, etc.
21
 
22
+
23
  # Pretrained Weights
24
 
25
  | model name | type | download | size |
 
27
  | InternViT-6B-224px | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/intern_vit_6b_224px.pth) | 12 GB |
28
  | InternVL-C-13B-224px | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/internvl_c_13b_224px.pth) | 25.4 GB |
29
 
30
+ # Linear-Probe Image Classification (ImageNet Series)
31
 
32
  | model name | IN-1K | IN-ReaL | IN-V2 | IN-A | IN-R | IN-Sketch | download |
33
  | ------------------ | :---: | :-----: | :---: | :--: | :--: | :-------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
34
  | InternViT-6B-224px | 88.2 | 90.4 | 79.9 | 77.5 | 89.8 | 69.1 | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px_head.pth) \| [log](https://github.com/OpenGVLab/InternVL/blob/main/classification/work_dirs/intern_vit_6b_1k_224/log_rank0.txt) |
35
 
36
+ # Semantic Segmentation (ADE20K)
37
 
38
  | type | backbone | head | mIoU | config | download |
39
  | --------------- | --------------------- | :-----: | :--: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
 
46
  | head tuning | InternViT-6B (frozen) | UperNet | 54.9 | [config](https://github.com/OpenGVLab/InternVL/blob/main/segmentation//configs/intern_vit_6b/head_tuning/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.py) | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.log) |
47
  | full tuning | InternViT-6B | UperNet | 58.9 | [config](https://github.com/OpenGVLab/InternVL/blob/main/segmentation//configs/intern_vit_6b/full_tuning/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.py) | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.log) |
48
 
 
49
  # License
50
  This project is released under the MIT license. Parts of this project contain code and models from other sources, which are subject to their respective licenses.
51