Weiyun1025 commited on
Commit
d92cb9a
Β·
verified Β·
1 Parent(s): 0493f06

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -5,9 +5,8 @@ license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
5
  pipeline_tag: image-text-to-text
6
  library_name: transformers
7
  base_model:
8
- - OpenGVLab/InternViT-300M-448px-V2_5
9
- - Qwen/Qwen2.5-14B
10
- base_model_relation: merge
11
  language:
12
  - multilingual
13
  tags:
@@ -15,7 +14,7 @@ tags:
15
  - custom_code
16
  ---
17
 
18
- # InternVL3-14B-Pretrain
19
 
20
  [\[πŸ“‚ GitHub\]](https://github.com/OpenGVLab/InternVL) [\[πŸ“œ InternVL 1.0\]](https://huggingface.co/papers/2312.14238) [\[πŸ“œ InternVL 1.5\]](https://huggingface.co/papers/2404.16821) [\[πŸ“œ InternVL 2.5\]](https://huggingface.co/papers/2412.05271) [\[πŸ“œ InternVL2.5-MPO\]](https://huggingface.co/papers/2411.10442) [\[πŸ“œ InternVL3\]](https://huggingface.co/papers/2504.10479)
21
 
@@ -27,7 +26,7 @@ tags:
27
 
28
  ## Introduction
29
 
30
- ***This is the pretrained version of InternVL3-14B, which has undergone native multimodal pre-trainin but has not undergone post-training (i.e., SFT and MPO).***
31
 
32
  We introduce InternVL3, an advanced multimodal large language model (MLLM) series that demonstrates superior overall performance.
33
  Compared to InternVL 2.5, InternVL3 exhibits superior multimodal perception and reasoning capabilities, while further extending its multimodal capabilities to encompass tool usage, GUI agents, industrial image analysis, 3D vision perception, and more.
 
5
  pipeline_tag: image-text-to-text
6
  library_name: transformers
7
  base_model:
8
+ - OpenGVLab/InternVL3-14B-Pretrain
9
+ base_model_relation: finetune
 
10
  language:
11
  - multilingual
12
  tags:
 
14
  - custom_code
15
  ---
16
 
17
+ # InternVL3-14B-Instruct
18
 
19
  [\[πŸ“‚ GitHub\]](https://github.com/OpenGVLab/InternVL) [\[πŸ“œ InternVL 1.0\]](https://huggingface.co/papers/2312.14238) [\[πŸ“œ InternVL 1.5\]](https://huggingface.co/papers/2404.16821) [\[πŸ“œ InternVL 2.5\]](https://huggingface.co/papers/2412.05271) [\[πŸ“œ InternVL2.5-MPO\]](https://huggingface.co/papers/2411.10442) [\[πŸ“œ InternVL3\]](https://huggingface.co/papers/2504.10479)
20
 
 
26
 
27
  ## Introduction
28
 
29
+ ***This is the SFT version of InternVL3-14B, which has undergone native multimodal pre-trainin and SFT but has not undergone MPO. If you're unsure which version to use, please use the [InternVL3-14B](https://huggingface.co/OpenGVLab/InternVL3-14B) version.***
30
 
31
  We introduce InternVL3, an advanced multimodal large language model (MLLM) series that demonstrates superior overall performance.
32
  Compared to InternVL 2.5, InternVL3 exhibits superior multimodal perception and reasoning capabilities, while further extending its multimodal capabilities to encompass tool usage, GUI agents, industrial image analysis, 3D vision perception, and more.