czczup commited on
Commit
a87298e
β€’
1 Parent(s): 1f6fd0b

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -5,6 +5,7 @@ library_name: transformers
5
  base_model:
6
  - OpenGVLab/InternViT-6B-448px-V1-0
7
  - meta-llama/Llama-2-13b-hf
 
8
  base_model_relation: merge
9
  language:
10
  - multilingual
@@ -19,10 +20,14 @@ tags:
19
 
20
  # InternVL-Chat-V1-1
21
 
22
- [\[πŸ“‚ GitHub\]](https://github.com/OpenGVLab/InternVL) [\[πŸ†• Blog\]](https://internvl.github.io/blog/) [\[πŸ“œ InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[πŸ“œ InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821)
23
 
24
  [\[πŸ—¨οΈ Chat Demo\]](https://internvl.opengvlab.com/) [\[πŸ€— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[πŸš€ Quick Start\]](#quick-start) [\[πŸ“– 中文解读\]](https://zhuanlan.zhihu.com/p/706547971) [\[πŸ“– Documents\]](https://internvl.readthedocs.io/en/latest/)
25
 
 
 
 
 
26
  ## Introduction
27
 
28
  We released [πŸ€— InternVL-Chat-V1-1](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1), featuring a structure similar to LLaVA, including a ViT, an MLP projector, and an LLM.
@@ -96,7 +101,7 @@ We provide an example code to run InternVL-Chat-V1-1 using `transformers`.
96
 
97
  We also welcome you to experience the InternVL2 series models in our [online demo](https://internvl.opengvlab.com/).
98
 
99
- > Please use transformers==4.37.2 to ensure the model works normally.
100
 
101
  ### Model Loading
102
 
@@ -451,7 +456,7 @@ print(f'User: {question}')
451
  print(f'Assistant: {response}')
452
  ```
453
 
454
- #### Streaming output
455
 
456
  Besides this method, you can also use the following code to get streamed output.
457
 
@@ -489,6 +494,12 @@ This project is released under the MIT license. Parts of this project contain co
489
  If you find this project useful in your research, please consider citing:
490
 
491
  ```BibTeX
 
 
 
 
 
 
492
  @article{chen2023internvl,
493
  title={InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks},
494
  author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and Li, Bin and Luo, Ping and Lu, Tong and Qiao, Yu and Dai, Jifeng},
 
5
  base_model:
6
  - OpenGVLab/InternViT-6B-448px-V1-0
7
  - meta-llama/Llama-2-13b-hf
8
+ new_version: OpenGVLab/InternVL2_5-8B
9
  base_model_relation: merge
10
  language:
11
  - multilingual
 
20
 
21
  # InternVL-Chat-V1-1
22
 
23
+ [\[πŸ“‚ GitHub\]](https://github.com/OpenGVLab/InternVL) [\[πŸ†• Blog\]](https://internvl.github.io/blog/) [\[πŸ“œ InternVL 1.0\]](https://arxiv.org/abs/2312.14238) [\[πŸ“œ InternVL 1.5\]](https://arxiv.org/abs/2404.16821) [\[πŸ“œ Mini-InternVL\]](https://arxiv.org/abs/2410.16261)
24
 
25
  [\[πŸ—¨οΈ Chat Demo\]](https://internvl.opengvlab.com/) [\[πŸ€— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[πŸš€ Quick Start\]](#quick-start) [\[πŸ“– 中文解读\]](https://zhuanlan.zhihu.com/p/706547971) [\[πŸ“– Documents\]](https://internvl.readthedocs.io/en/latest/)
26
 
27
+ <div align="center">
28
+ <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
29
+ </div>
30
+
31
  ## Introduction
32
 
33
  We released [πŸ€— InternVL-Chat-V1-1](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1), featuring a structure similar to LLaVA, including a ViT, an MLP projector, and an LLM.
 
101
 
102
  We also welcome you to experience the InternVL2 series models in our [online demo](https://internvl.opengvlab.com/).
103
 
104
+ > Please use transformers>=4.37.2 to ensure the model works normally.
105
 
106
  ### Model Loading
107
 
 
456
  print(f'Assistant: {response}')
457
  ```
458
 
459
+ #### Streaming Output
460
 
461
  Besides this method, you can also use the following code to get streamed output.
462
 
 
494
  If you find this project useful in your research, please consider citing:
495
 
496
  ```BibTeX
497
+ @article{gao2024mini,
498
+ title={Mini-internvl: A flexible-transfer pocket multimodal model with 5\% parameters and 90\% performance},
499
+ author={Gao, Zhangwei and Chen, Zhe and Cui, Erfei and Ren, Yiming and Wang, Weiyun and Zhu, Jinguo and Tian, Hao and Ye, Shenglong and He, Junjun and Zhu, Xizhou and others},
500
+ journal={arXiv preprint arXiv:2410.16261},
501
+ year={2024}
502
+ }
503
  @article{chen2023internvl,
504
  title={InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks},
505
  author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and Li, Bin and Luo, Ping and Lu, Tong and Qiao, Yu and Dai, Jifeng},
configuration_internvl_chat.py CHANGED
@@ -38,11 +38,11 @@ class InternVLChatConfig(PretrainedConfig):
38
  super().__init__(**kwargs)
39
 
40
  if vision_config is None:
41
- vision_config = {}
42
  logger.info('vision_config is None. Initializing the InternVisionConfig with default values.')
43
 
44
  if llm_config is None:
45
- llm_config = {}
46
  logger.info('llm_config is None. Initializing the LlamaConfig config with default values (`LlamaConfig`).')
47
 
48
  self.vision_config = InternVisionConfig(**vision_config)
 
38
  super().__init__(**kwargs)
39
 
40
  if vision_config is None:
41
+ vision_config = {'architectures': ['InternVisionModel']}
42
  logger.info('vision_config is None. Initializing the InternVisionConfig with default values.')
43
 
44
  if llm_config is None:
45
+ llm_config = {'architectures': ['LlamaForCausalLM']}
46
  logger.info('llm_config is None. Initializing the LlamaConfig config with default values (`LlamaConfig`).')
47
 
48
  self.vision_config = InternVisionConfig(**vision_config)
modeling_intern_vit.py CHANGED
@@ -3,6 +3,7 @@
3
  # Copyright (c) 2024 OpenGVLab
4
  # Licensed under The MIT License [see LICENSE for details]
5
  # --------------------------------------------------------
 
6
  from typing import Optional, Tuple, Union
7
 
8
  import torch
 
3
  # Copyright (c) 2024 OpenGVLab
4
  # Licensed under The MIT License [see LICENSE for details]
5
  # --------------------------------------------------------
6
+
7
  from typing import Optional, Tuple, Union
8
 
9
  import torch