czczup commited on
Commit
9a9ef7c
โ€ข
1 Parent(s): 46bbe53

fix compatibility issue for transformers 4.46+

Browse files
README.md CHANGED
@@ -5,6 +5,7 @@ library_name: transformers
5
  base_model:
6
  - OpenGVLab/InternViT-6B-448px-V1-5
7
  - NousResearch/Nous-Hermes-2-Yi-34B
 
8
  base_model_relation: merge
9
  language:
10
  - multilingual
@@ -19,13 +20,13 @@ tags:
19
 
20
  # InternVL2-40B
21
 
22
- [\[๐Ÿ“‚ GitHub\]](https://github.com/OpenGVLab/InternVL) [\[๐Ÿ†• Blog\]](https://internvl.github.io/blog/) [\[๐Ÿ“œ InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[๐Ÿ“œ InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821)
23
 
24
  [\[๐Ÿ—จ๏ธ Chat Demo\]](https://internvl.opengvlab.com/) [\[๐Ÿค— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[๐Ÿš€ Quick Start\]](#quick-start) [\[๐Ÿ“– ไธญๆ–‡่งฃ่ฏป\]](https://zhuanlan.zhihu.com/p/706547971) [\[๐Ÿ“– Documents\]](https://internvl.readthedocs.io/en/latest/)
25
 
26
- [ๅˆ‡ๆข่‡ณไธญๆ–‡็‰ˆ](#็ฎ€ไป‹)
27
-
28
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/_mLpMwsav5eMeNcZdrIQl.png)
29
 
30
  ## Introduction
31
 
@@ -65,7 +66,7 @@ InternVL 2.0 is a multimodal large language model series, featuring models of va
65
  | MME<sub>sum</sub> | 2070.2 | 2110.6 | 2260.7 | 2315.0 |
66
  | RealWorldQA | 68.0 | 67.5 | 68.3 | 71.8 |
67
  | AI2D<sub>test</sub> | 89.4 | 80.3 | 84.5 | 87.1 |
68
- | MMMU<sub>val</sub> | 63.1 / 61.7 | 58.5 / 60.6 | 48.3 / 51.2 | 53.9 / 55.2 |
69
  | MMBench-EN<sub>test</sub> | 81.0 | 73.9 | 83.4 | 86.8 |
70
  | MMBench-CN<sub>test</sub> | 80.2 | 73.8 | 82.0 | 86.5 |
71
  | CCBench<sub>dev</sub> | 57.3 | 28.4 | 73.5 | 80.6 |
@@ -78,9 +79,7 @@ InternVL 2.0 is a multimodal large language model series, featuring models of va
78
 
79
  - For more details and evaluation reproduction, please refer to our [Evaluation Guide](https://internvl.readthedocs.io/en/latest/internvl2.0/evaluation.html).
80
 
81
- - We simultaneously use [InternVL](https://github.com/OpenGVLab/InternVL) and [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) repositories for model evaluation. Specifically, the results reported for DocVQA, ChartQA, InfoVQA, TextVQA, MME, AI2D, MMBench, CCBench, MMVet, and SEED-Image were tested using the InternVL repository. OCRBench, RealWorldQA, HallBench, and MathVista were evaluated using the VLMEvalKit.
82
-
83
- - For MMMU, we report both the original scores (left side: evaluated using the InternVL codebase for InternVL series models, and sourced from technical reports or webpages for other models) and the VLMEvalKit scores (right side: collected from the OpenCompass leaderboard).
84
 
85
  - Please note that evaluating the same model using different testing toolkits like [InternVL](https://github.com/OpenGVLab/InternVL) and [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) can result in slight differences, which is normal. Updates to code versions and variations in environment and hardware can also cause minor discrepancies in results.
86
 
@@ -130,7 +129,7 @@ We provide an example code to run InternVL2-40B using `transformers`.
130
 
131
  We also welcome you to experience the InternVL2 series models in our [online demo](https://internvl.opengvlab.com/).
132
 
133
- > Please use transformers==4.37.2 to ensure the model works normally.
134
 
135
  ### Model Loading
136
 
@@ -462,7 +461,7 @@ response, history = model.chat(tokenizer, pixel_values, question, generation_con
462
  print(f'User: {question}\nAssistant: {response}')
463
  ```
464
 
465
- #### Streaming output
466
 
467
  Besides this method, you can also use the following code to get streamed output.
468
 
@@ -502,12 +501,12 @@ Many repositories now support fine-tuning of the InternVL series models, includi
502
  LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
503
 
504
  ```sh
505
- pip install lmdeploy==0.5.3
506
  ```
507
 
508
  LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference pipeline.
509
 
510
- #### A 'Hello, world' example
511
 
512
  ```python
513
  from lmdeploy import pipeline, TurbomindEngineConfig
@@ -522,7 +521,7 @@ print(response.text)
522
 
523
  If `ImportError` occurs while executing this case, please install the required dependency packages as prompted.
524
 
525
- #### Multi-images inference
526
 
527
  When dealing with multiple images, you can put them all in one list. Keep in mind that multiple images will lead to a higher number of input tokens, and as a result, the size of the context window typically needs to be increased.
528
 
@@ -547,7 +546,7 @@ response = pipe((f'Image-1: {IMAGE_TOKEN}\nImage-2: {IMAGE_TOKEN}\ndescribe thes
547
  print(response.text)
548
  ```
549
 
550
- #### Batch prompts inference
551
 
552
  Conducting inference with batch prompts is quite straightforward; just place them within a list structure:
553
 
@@ -567,7 +566,7 @@ response = pipe(prompts)
567
  print(response)
568
  ```
569
 
570
- #### Multi-turn conversation
571
 
572
  There are two ways to do the multi-turn conversations with the pipeline. One is to construct messages according to the format of OpenAI and use above introduced method, the other is to use the `pipeline.chat` interface.
573
 
@@ -637,271 +636,12 @@ This project is released under the MIT license, while InternLM2 is licensed unde
637
  If you find this project useful in your research, please consider citing:
638
 
639
  ```BibTeX
640
- @article{chen2023internvl,
641
- title={InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks},
642
- author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and Li, Bin and Luo, Ping and Lu, Tong and Qiao, Yu and Dai, Jifeng},
643
- journal={arXiv preprint arXiv:2312.14238},
644
- year={2023}
645
- }
646
- @article{chen2024far,
647
- title={How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites},
648
- author={Chen, Zhe and Wang, Weiyun and Tian, Hao and Ye, Shenglong and Gao, Zhangwei and Cui, Erfei and Tong, Wenwen and Hu, Kongzhi and Luo, Jiapeng and Ma, Zheng and others},
649
- journal={arXiv preprint arXiv:2404.16821},
650
  year={2024}
651
  }
652
- ```
653
-
654
- ## ็ฎ€ไป‹
655
-
656
- ๆˆ‘ไปฌๅพˆ้ซ˜ๅ…ดๅฎฃๅธƒ InternVL 2.0 ็š„ๅ‘ๅธƒ๏ผŒ่ฟ™ๆ˜ฏ InternVL ็ณปๅˆ—ๅคšๆจกๆ€ๅคง่ฏญ่จ€ๆจกๅž‹็š„ๆœ€ๆ–ฐ็‰ˆๆœฌใ€‚InternVL 2.0 ๆไพ›ไบ†ๅคš็ง**ๆŒ‡ไปคๅพฎ่ฐƒ**็š„ๆจกๅž‹๏ผŒๅ‚ๆ•ฐไปŽ 10 ไบฟๅˆฐ 1080 ไบฟไธ็ญ‰ใ€‚ๆญคไป“ๅบ“ๅŒ…ๅซ็ป่ฟ‡ๆŒ‡ไปคๅพฎ่ฐƒ็š„ InternVL2-40B ๆจกๅž‹ใ€‚
657
-
658
- ไธŽๆœ€ๅ…ˆ่ฟ›็š„ๅผ€ๆบๅคšๆจกๆ€ๅคง่ฏญ่จ€ๆจกๅž‹็›ธๆฏ”๏ผŒInternVL 2.0 ่ถ…่ถŠไบ†ๅคงๅคšๆ•ฐๅผ€ๆบๆจกๅž‹ใ€‚ๅฎƒๅœจๅ„็ง่ƒฝๅŠ›ไธŠ่กจ็Žฐๅ‡บไธŽ้—ญๆบๅ•†ไธšๆจกๅž‹็›ธๅชฒ็พŽ็š„็ซžไบ‰ๅŠ›๏ผŒๅŒ…ๆ‹ฌๆ–‡ๆกฃๅ’Œๅ›พ่กจ็†่งฃใ€ไฟกๆฏๅ›พ่กจ้—ฎ็ญ”ใ€ๅœบๆ™ฏๆ–‡ๆœฌ็†่งฃๅ’Œ OCR ไปปๅŠกใ€็ง‘ๅญฆๅ’Œๆ•ฐๅญฆ้—ฎ้ข˜่งฃๅ†ณ๏ผŒไปฅๅŠๆ–‡ๅŒ–็†่งฃๅ’Œ็ปผๅˆๅคšๆจกๆ€่ƒฝๅŠ›ใ€‚
659
-
660
- InternVL 2.0 ไฝฟ็”จ 8k ไธŠไธ‹ๆ–‡็ช—ๅฃ่ฟ›่กŒ่ฎญ็ปƒ๏ผŒ่ฎญ็ปƒๆ•ฐๆฎๅŒ…ๅซ้•ฟๆ–‡ๆœฌใ€ๅคšๅ›พๅ’Œ่ง†้ข‘ๆ•ฐๆฎ๏ผŒไธŽ InternVL 1.5 ็›ธๆฏ”๏ผŒๅ…ถๅค„็†่ฟ™ไบ›็ฑปๅž‹่พ“ๅ…ฅ็š„่ƒฝๅŠ›ๆ˜พ่‘—ๆ้ซ˜ใ€‚ๆ›ดๅคš่ฏฆ็ป†ไฟกๆฏ๏ผŒ่ฏทๅ‚้˜…ๆˆ‘ไปฌ็š„ๅšๅฎขๅ’Œ GitHubใ€‚
661
-
662
- | ๆจกๅž‹ๅ็งฐ | ่ง†่ง‰้ƒจๅˆ† | ่ฏญ่จ€้ƒจๅˆ† | HF ้“พๆŽฅ | MS ้“พๆŽฅ |
663
- | :------------------: | :---------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------: | :--------------------------------------------------------------: | :--------------------------------------------------------------------: |
664
- | InternVL2-1B | [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px) | [Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) | [๐Ÿค— link](https://huggingface.co/OpenGVLab/InternVL2-1B) | [๐Ÿค– link](https://modelscope.cn/models/OpenGVLab/InternVL2-1B) |
665
- | InternVL2-2B | [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px) | [internlm2-chat-1_8b](https://huggingface.co/internlm/internlm2-chat-1_8b) | [๐Ÿค— link](https://huggingface.co/OpenGVLab/InternVL2-2B) | [๐Ÿค– link](https://modelscope.cn/models/OpenGVLab/InternVL2-2B) |
666
- | InternVL2-4B | [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px) | [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) | [๐Ÿค— link](https://huggingface.co/OpenGVLab/InternVL2-4B) | [๐Ÿค– link](https://modelscope.cn/models/OpenGVLab/InternVL2-4B) |
667
- | InternVL2-8B | [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px) | [internlm2_5-7b-chat](https://huggingface.co/internlm/internlm2_5-7b-chat) | [๐Ÿค— link](https://huggingface.co/OpenGVLab/InternVL2-8B) | [๐Ÿค– link](https://modelscope.cn/models/OpenGVLab/InternVL2-8B) |
668
- | InternVL2-26B | [InternViT-6B-448px-V1-5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) | [internlm2-chat-20b](https://huggingface.co/internlm/internlm2-chat-20b) | [๐Ÿค— link](https://huggingface.co/OpenGVLab/InternVL2-26B) | [๐Ÿค– link](https://modelscope.cn/models/OpenGVLab/InternVL2-26B) |
669
- | InternVL2-40B | [InternViT-6B-448px-V1-5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) | [Nous-Hermes-2-Yi-34B](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B) | [๐Ÿค— link](https://huggingface.co/OpenGVLab/InternVL2-40B) | [๐Ÿค– link](https://modelscope.cn/models/OpenGVLab/InternVL2-40B) |
670
- | InternVL2-Llama3-76B | [InternViT-6B-448px-V1-5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) | [Hermes-2-Theta-Llama-3-70B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-70B) | [๐Ÿค— link](https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B) | [๐Ÿค– link](https://modelscope.cn/models/OpenGVLab/InternVL2-Llama3-76B) |
671
-
672
- ## ๆจกๅž‹็ป†่Š‚
673
-
674
- InternVL 2.0 ๆ˜ฏไธ€ไธชๅคšๆจกๆ€ๅคง่ฏญ่จ€ๆจกๅž‹็ณปๅˆ—๏ผŒๅŒ…ๅซๅ„็ง่ง„ๆจก็š„ๆจกๅž‹ใ€‚ๅฏนไบŽๆฏไธช่ง„ๆจก็š„ๆจกๅž‹๏ผŒๆˆ‘ไปฌ้ƒฝไผšๅ‘ๅธƒ้’ˆๅฏนๅคšๆจกๆ€ไปปๅŠกไผ˜ๅŒ–็š„ๆŒ‡ไปคๅพฎ่ฐƒๆจกๅž‹ใ€‚InternVL2-40B ๅŒ…ๅซ [InternViT-6B-448px-V1-5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5)ใ€ไธ€ไธช MLP ๆŠ•ๅฝฑๅ™จๅ’Œ [Nous-Hermes-2-Yi-34B](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B)ใ€‚
675
-
676
- ## ๆ€ง่ƒฝๆต‹่ฏ•
677
-
678
- ### ๅ›พๅƒ็›ธๅ…ณ่ฏ„ๆต‹
679
-
680
- | ่ฏ„ๆต‹ๆ•ฐๆฎ้›† | GPT-4T-20240409 | Gemini-1.5-Pro | InternVL2-26B | InternVL2-40B |
681
- | :--------------------------: | :-------------: | :------------: | :-----------: | :-----------: |
682
- | ๆจกๅž‹ๅคงๅฐ | - | - | 25.5B | 40B |
683
- | | | | | |
684
- | DocVQA<sub>test</sub> | 87.2 | 86.5 | 92.9 | 93.9 |
685
- | ChartQA<sub>test</sub> | 78.1 | 81.3 | 84.9 | 86.2 |
686
- | InfoVQA<sub>test</sub> | - | 72.7 | 75.9 | 78.7 |
687
- | TextVQA<sub>val</sub> | - | 73.5 | 82.3 | 83.0 |
688
- | OCRBench | 678 | 754 | 825 | 837 |
689
- | MME<sub>sum</sub> | 2070.2 | 2110.6 | 2260.7 | 2315.0 |
690
- | RealWorldQA | 68.0 | 67.5 | 68.3 | 71.8 |
691
- | AI2D<sub>test</sub> | 89.4 | 80.3 | 84.5 | 87.1 |
692
- | MMMU<sub>val</sub> | 63.1 / 61.7 | 58.5 / 60.6 | 48.3 / 51.2 | 53.9 / 55.2 |
693
- | MMBench-EN<sub>test</sub> | 81.0 | 73.9 | 83.4 | 86.8 |
694
- | MMBench-CN<sub>test</sub> | 80.2 | 73.8 | 82.0 | 86.5 |
695
- | CCBench<sub>dev</sub> | 57.3 | 28.4 | 73.5 | 80.6 |
696
- | MMVet<sub>GPT-4-0613</sub> | - | - | 64.2 | 68.5 |
697
- | MMVet<sub>GPT-4-Turbo</sub> | 67.5 | 64.0 | 62.1 | 65.5 |
698
- | SEED-Image | - | - | 76.8 | 78.2 |
699
- | HallBench<sub>avg</sub> | 43.9 | 45.6 | 50.7 | 56.9 |
700
- | MathVista<sub>testmini</sub> | 58.1 | 57.7 | 59.4 | 63.7 |
701
- | OpenCompass<sub>avg</sub> | 63.5 | 64.4 | 66.4 | 69.7 |
702
-
703
- - ๅ…ณไบŽๆ›ดๅคš็š„็ป†่Š‚ไปฅๅŠ่ฏ„ๆต‹ๅค็Žฐ๏ผŒ่ฏท็œ‹ๆˆ‘ไปฌ็š„[่ฏ„ๆต‹ๆŒ‡ๅ—](https://internvl.readthedocs.io/en/latest/internvl2.0/evaluation.html)ใ€‚
704
-
705
- - ๆˆ‘ไปฌๅŒๆ—ถไฝฟ็”จ InternVL ๅ’Œ VLMEvalKit ไป“ๅบ“่ฟ›่กŒๆจกๅž‹่ฏ„ไผฐใ€‚ๅ…ทไฝ“ๆฅ่ฏด๏ผŒDocVQAใ€ChartQAใ€InfoVQAใ€TextVQAใ€MMEใ€AI2Dใ€MMBenchใ€CCBenchใ€MMVet ๅ’Œ SEED-Image ็š„็ป“ๆžœๆ˜ฏไฝฟ็”จ InternVL ไป“ๅบ“ๆต‹่ฏ•็š„ใ€‚OCRBenchใ€RealWorldQAใ€HallBench ๅ’Œ MathVista ๆ˜ฏไฝฟ็”จ VLMEvalKit ่ฟ›่กŒ่ฏ„ไผฐ็š„ใ€‚
706
-
707
- - ๅฏนไบŽMMMU๏ผŒๆˆ‘ไปฌๆŠฅๅ‘Šไบ†ๅŽŸๅง‹ๅˆ†ๆ•ฐ๏ผˆๅทฆไพง๏ผšInternVL็ณปๅˆ—ๆจกๅž‹ไฝฟ็”จInternVLไปฃ็ ๅบ“่ฏ„ๆต‹๏ผŒๅ…ถไป–ๆจกๅž‹็š„ๅˆ†ๆ•ฐๆฅ่‡ชๅ…ถๆŠ€ๆœฏๆŠฅๅ‘Šๆˆ–็ฝ‘้กต๏ผ‰ๅ’ŒVLMEvalKitๅˆ†ๆ•ฐ๏ผˆๅณไพง๏ผšไปŽOpenCompassๆŽ’่กŒๆฆœๆ”ถ้›†๏ผ‰ใ€‚
708
-
709
- - ่ฏทๆณจๆ„๏ผŒไฝฟ็”จ๏ฟฝ๏ฟฝ๏ฟฝๅŒ็š„ๆต‹่ฏ•ๅทฅๅ…ทๅŒ…๏ผˆๅฆ‚ InternVL ๅ’Œ VLMEvalKit๏ผ‰่ฏ„ไผฐๅŒไธ€ๆจกๅž‹ๅฏ่ƒฝไผšๅฏผ่‡ด็ป†ๅพฎๅทฎๅผ‚๏ผŒ่ฟ™ๆ˜ฏๆญฃๅธธ็š„ใ€‚ไปฃ็ ็‰ˆๆœฌ็š„ๆ›ดๆ–ฐใ€็Žฏๅขƒๅ’Œ็กฌไปถ็š„ๅ˜ๅŒ–ไนŸๅฏ่ƒฝๅฏผ่‡ด็ป“ๆžœ็š„ๅพฎๅฐๅทฎๅผ‚ใ€‚
710
-
711
- ### ่ง†้ข‘็›ธๅ…ณ่ฏ„ๆต‹
712
-
713
- | ่ฏ„ๆต‹ๆ•ฐๆฎ้›† | GPT-4V | VILA-1.5 | LLaVA-NeXT-Video | InternVL2-26B | InternVL2-40B |
714
- | :-------------------------: | :----: | :------: | :--------------: | :-----------: | :-----------: |
715
- | ๆจกๅž‹ๅคงๅฐ | - | 34B | 34B | 25.5B | 40B |
716
- | | | | | | |
717
- | MVBench | - | - | - | 67.5 | 72.5 |
718
- | MMBench-Video<sub>8f</sub> | 1.53 | - | - | 1.27 | 1.32 |
719
- | MMBench-Video<sub>16f</sub> | 1.68 | - | - | 1.41 | 1.45 |
720
- | Video-MME<br>w/o subs | 59.9 | 59.0 | 52.0 | 54.8 | 61.2 |
721
- | Video-MME<br>w subs | 63.3 | 59.4 | 54.9 | 57.1 | 62.4 |
722
-
723
- - ๆˆ‘ไปฌ้€š่ฟ‡ไปŽๆฏไธช่ง†้ข‘ไธญๆๅ– 16 ๅธงๆฅ่ฏ„ไผฐๆˆ‘ไปฌ็š„ๆจกๅž‹ๅœจ MVBench ๅ’Œ Video-MME ไธŠ็š„ๆ€ง่ƒฝ๏ผŒๆฏไธช่ง†้ข‘ๅธง่ขซ่ฐƒๆ•ดไธบ 448x448 ็š„ๅ›พๅƒใ€‚
724
-
725
- ### ๅฎšไฝ็›ธๅ…ณ่ฏ„ๆต‹
726
-
727
- | ๆจกๅž‹ | avg. | RefCOCO<br>(val) | RefCOCO<br>(testA) | RefCOCO<br>(testB) | RefCOCO+<br>(val) | RefCOCO+<br>(testA) | RefCOCO+<br>(testB) | RefCOCOโ€‘g<br>(val) | RefCOCOโ€‘g<br>(test) |
728
- | :----------------------------: | :--: | :--------------: | :----------------: | :----------------: | :---------------: | :-----------------: | :-----------------: | :----------------: | :-----------------: |
729
- | UNINEXT-H<br>(Specialist SOTA) | 88.9 | 92.6 | 94.3 | 91.5 | 85.2 | 89.6 | 79.8 | 88.7 | 89.4 |
730
- | | | | | | | | | | |
731
- | Mini-InternVL-<br>Chat-2B-V1-5 | 75.8 | 80.7 | 86.7 | 72.9 | 72.5 | 82.3 | 60.8 | 75.6 | 74.9 |
732
- | Mini-InternVL-<br>Chat-4B-V1-5 | 84.4 | 88.0 | 91.4 | 83.5 | 81.5 | 87.4 | 73.8 | 84.7 | 84.6 |
733
- | InternVLโ€‘Chatโ€‘V1โ€‘5 | 88.8 | 91.4 | 93.7 | 87.1 | 87.0 | 92.3 | 80.9 | 88.5 | 89.3 |
734
- | | | | | | | | | | |
735
- | InternVL2โ€‘1B | 79.9 | 83.6 | 88.7 | 79.8 | 76.0 | 83.6 | 67.7 | 80.2 | 79.9 |
736
- | InternVL2โ€‘2B | 77.7 | 82.3 | 88.2 | 75.9 | 73.5 | 82.8 | 63.3 | 77.6 | 78.3 |
737
- | InternVL2โ€‘4B | 84.4 | 88.5 | 91.2 | 83.9 | 81.2 | 87.2 | 73.8 | 84.6 | 84.6 |
738
- | InternVL2โ€‘8B | 82.9 | 87.1 | 91.1 | 80.7 | 79.8 | 87.9 | 71.4 | 82.7 | 82.7 |
739
- | InternVL2โ€‘26B | 88.5 | 91.2 | 93.3 | 87.4 | 86.8 | 91.0 | 81.2 | 88.5 | 88.6 |
740
- | InternVL2โ€‘40B | 90.3 | 93.0 | 94.7 | 89.2 | 88.5 | 92.8 | 83.6 | 90.3 | 90.6 |
741
- | InternVL2-<br>Llama3โ€‘76B | 90.0 | 92.2 | 94.8 | 88.4 | 88.8 | 93.1 | 82.8 | 89.5 | 90.3 |
742
-
743
- - ๆˆ‘ไปฌไฝฟ็”จไปฅไธ‹ Prompt ๆฅ่ฏ„ๆต‹ InternVL ็š„ Grounding ่ƒฝๅŠ›: `Please provide the bounding box coordinates of the region this sentence describes: <ref>{}</ref>`
744
-
745
- ้™ๅˆถ๏ผšๅฐฝ็ฎกๅœจ่ฎญ็ปƒ่ฟ‡็จ‹ไธญๆˆ‘ไปฌ้žๅธธๆณจ้‡ๆจกๅž‹็š„ๅฎ‰ๅ…จๆ€ง๏ผŒๅฐฝๅŠ›ไฟƒไฝฟๆจกๅž‹่พ“ๅ‡บ็ฌฆๅˆไผฆ็†ๅ’Œๆณ•ๅพ‹่ฆๆฑ‚็š„ๆ–‡ๆœฌ๏ผŒไฝ†ๅ—้™ไบŽๆจกๅž‹ๅคงๅฐไปฅๅŠๆฆ‚็Ž‡็”Ÿๆˆ่Œƒๅผ๏ผŒๆจกๅž‹ๅฏ่ƒฝไผšไบง็”Ÿๅ„็งไธ็ฌฆๅˆ้ข„ๆœŸ็š„่พ“ๅ‡บ๏ผŒไพ‹ๅฆ‚ๅ›žๅคๅ†…ๅฎนๅŒ…ๅซๅ่งใ€ๆญง่ง†็ญ‰ๆœ‰ๅฎณๅ†…ๅฎน๏ผŒ่ฏทๅ‹ฟไผ ๆ’ญ่ฟ™ไบ›ๅ†…ๅฎนใ€‚็”ฑไบŽไผ ๆ’ญไธ่‰ฏไฟกๆฏๅฏผ่‡ด็š„ไปปไฝ•ๅŽๆžœ๏ผŒๆœฌ้กน็›ฎ๏ฟฝ๏ฟฝ๏ฟฝๆ‰ฟๆ‹…่ดฃไปปใ€‚
746
-
747
- ### ้‚€่ฏท่ฏ„ๆต‹ InternVL
748
-
749
- ๆˆ‘ไปฌๆฌข่ฟŽๅ„ไฝ MLLM benchmark ็š„ๅผ€ๅ‘่€…ๅฏนๆˆ‘ไปฌ็š„ InternVL1.5 ไปฅๅŠ InternVL2 ็ณปๅˆ—ๆจกๅž‹่ฟ›่กŒ่ฏ„ๆต‹ใ€‚ๅฆ‚ๆžœ้œ€่ฆๅœจๆญคๅค„ๆทปๅŠ ่ฏ„ๆต‹็ป“ๆžœ๏ผŒ่ฏทไธŽๆˆ‘่”็ณป๏ผˆ[[email protected]](mailto:[email protected])๏ผ‰ใ€‚
750
-
751
- ## ๅฟซ้€ŸๅฏๅŠจ
752
-
753
- ๆˆ‘ไปฌๆไพ›ไบ†ไธ€ไธช็คบไพ‹ไปฃ็ ๏ผŒ็”จไบŽไฝฟ็”จ `transformers` ่ฟ่กŒ InternVL2-40Bใ€‚
754
-
755
- ๆˆ‘ไปฌไนŸๆฌข่ฟŽไฝ ๅœจๆˆ‘ไปฌ็š„[ๅœจ็บฟdemo](https://internvl.opengvlab.com/)ไธญไฝ“้ชŒInternVL2็š„็ณปๅˆ—ๆจกๅž‹ใ€‚
756
-
757
- > ่ฏทไฝฟ็”จ transformers==4.37.2 ไปฅ็กฎไฟๆจกๅž‹ๆญฃๅธธ่ฟ่กŒใ€‚
758
-
759
- ็คบไพ‹ไปฃ็ ่ฏท[็‚นๅ‡ป่ฟ™้‡Œ](#quick-start)ใ€‚
760
-
761
- ## ๅพฎ่ฐƒ
762
-
763
- ่ฎธๅคšไป“ๅบ“็Žฐๅœจ้ƒฝๆ”ฏๆŒ InternVL ็ณปๅˆ—ๆจกๅž‹็š„ๅพฎ่ฐƒ๏ผŒๅŒ…ๆ‹ฌ [InternVL](https://github.com/OpenGVLab/InternVL)ใ€[SWIFT](https://github.com/modelscope/ms-swift)ใ€[XTurner](https://github.com/InternLM/xtuner) ็ญ‰ใ€‚่ฏทๅ‚้˜…ๅฎƒไปฌ็š„ๆ–‡ๆกฃไปฅ่Žทๅ–ๆ›ดๅคšๅพฎ่ฐƒ็ป†่Š‚ใ€‚
764
-
765
- ## ้ƒจ็ฝฒ
766
-
767
- ### LMDeploy
768
-
769
- LMDeploy ๆ˜ฏ็”ฑ MMRazor ๅ’Œ MMDeploy ๅ›ข้˜Ÿๅผ€ๅ‘็š„็”จไบŽๅŽ‹็ผฉใ€้ƒจ็ฝฒๅ’ŒๆœๅŠกๅคง่ฏญ่จ€ๆจกๅž‹๏ผˆLLM๏ผ‰็š„ๅทฅๅ…ทๅŒ…ใ€‚
770
-
771
- ```sh
772
- pip install lmdeploy==0.5.3
773
- ```
774
-
775
- LMDeploy ๅฐ†ๅคšๆจกๆ€่ง†่ง‰-่ฏญ่จ€ๆจกๅž‹๏ผˆVLM๏ผ‰็š„ๅคๆ‚ๆŽจ็†่ฟ‡็จ‹ๆŠฝ่ฑกไธบไธ€ไธชๆ˜“ไบŽไฝฟ็”จ็š„็ฎก้“๏ผŒ็ฑปไผผไบŽๅคง่ฏญ่จ€ๆจกๅž‹๏ผˆLLM๏ผ‰็š„ๆŽจ็†็ฎก้“ใ€‚
776
-
777
- #### ไธ€ไธชโ€œไฝ ๅฅฝ๏ผŒไธ–็•Œโ€็คบไพ‹
778
-
779
- ```python
780
- from lmdeploy import pipeline, TurbomindEngineConfig
781
- from lmdeploy.vl import load_image
782
-
783
- model = 'OpenGVLab/InternVL2-40B'
784
- image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
785
- pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
786
- response = pipe(('describe this image', image))
787
- print(response.text)
788
- ```
789
-
790
- ๅฆ‚ๆžœๅœจๆ‰ง่กŒๆญค็คบไพ‹ๆ—ถๅ‡บ็Žฐ `ImportError`๏ผŒ่ฏทๆŒ‰็…งๆ็คบๅฎ‰่ฃ…ๆ‰€้œ€็š„ไพ่ต–ๅŒ…ใ€‚
791
-
792
- #### ๅคšๅ›พๅƒๆŽจ็†
793
-
794
- ๅœจๅค„็†ๅคšๅผ ๅ›พๅƒๆ—ถ๏ผŒๅฏไปฅๅฐ†ๅฎƒไปฌๅ…จ้ƒจๆ”พๅ…ฅไธ€ไธชๅˆ—่กจไธญใ€‚่ฏทๆณจๆ„๏ผŒๅคšๅผ ๅ›พๅƒไผšๅฏผ่‡ด่พ“ๅ…ฅ token ๆ•ฐ้‡ๅขžๅŠ ๏ผŒๅ› ๆญค้€šๅธธ้œ€่ฆๅขžๅŠ ไธŠไธ‹ๆ–‡็ช—ๅฃ็š„ๅคงๅฐใ€‚
795
-
796
- ```python
797
- from lmdeploy import pipeline, TurbomindEngineConfig
798
- from lmdeploy.vl import load_image
799
- from lmdeploy.vl.constants import IMAGE_TOKEN
800
-
801
- model = 'OpenGVLab/InternVL2-40B'
802
- pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
803
-
804
- image_urls=[
805
- 'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg',
806
- 'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/det.jpg'
807
- ]
808
-
809
- images = [load_image(img_url) for img_url in image_urls]
810
- # Numbering images improves multi-image conversations
811
- response = pipe((f'Image-1: {IMAGE_TOKEN}\nImage-2: {IMAGE_TOKEN}\ndescribe these two images', images))
812
- print(response.text)
813
- ```
814
-
815
- #### ๆ‰น้‡PromptๆŽจ็†
816
-
817
- ไฝฟ็”จๆ‰น้‡Prompt่ฟ›่กŒๆŽจ็†้žๅธธ็ฎ€ๅ•๏ผ›ๅช้œ€ๅฐ†ๅฎƒไปฌๆ”พๅœจไธ€ไธชๅˆ—่กจ็ป“ๆž„ไธญ๏ผš
818
-
819
- ```python
820
- from lmdeploy import pipeline, TurbomindEngineConfig
821
- from lmdeploy.vl import load_image
822
-
823
- model = 'OpenGVLab/InternVL2-40B'
824
- pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
825
-
826
- image_urls=[
827
- "https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg",
828
- "https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/det.jpg"
829
- ]
830
- prompts = [('describe this image', load_image(img_url)) for img_url in image_urls]
831
- response = pipe(prompts)
832
- print(response)
833
- ```
834
-
835
- #### ๅคš่ฝฎๅฏน่ฏ
836
-
837
- ไฝฟ็”จ็ฎก้“่ฟ›่กŒๅคš่ฝฎๅฏน่ฏๆœ‰ไธค็งๆ–นๆณ•ใ€‚ไธ€็งๆ˜ฏๆ นๆฎ OpenAI ็š„ๆ ผๅผๆž„ๅปบๆถˆๆฏๅนถไฝฟ็”จไธŠ่ฟฐๆ–นๆณ•๏ผŒๅฆไธ€็งๆ˜ฏไฝฟ็”จ `pipeline.chat` ๆŽฅๅฃใ€‚
838
-
839
- ```python
840
- from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
841
- from lmdeploy.vl import load_image
842
-
843
- model = 'OpenGVLab/InternVL2-40B'
844
- pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
845
-
846
- image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')
847
- gen_config = GenerationConfig(top_k=40, top_p=0.8, temperature=0.8)
848
- sess = pipe.chat(('describe this image', image), gen_config=gen_config)
849
- print(sess.response.text)
850
- sess = pipe.chat('What is the woman doing?', session=sess, gen_config=gen_config)
851
- print(sess.response.text)
852
- ```
853
-
854
- #### API้ƒจ็ฝฒ
855
-
856
- LMDeploy ็š„ `api_server` ไฝฟๆจกๅž‹่ƒฝๅคŸ้€š่ฟ‡ไธ€ไธชๅ‘ฝไปค่ฝปๆพๆ‰“ๅŒ…ๆˆๆœๅŠกใ€‚ๆไพ›็š„ RESTful API ไธŽ OpenAI ็š„ๆŽฅๅฃๅ…ผๅฎนใ€‚ไปฅไธ‹ๆ˜ฏๆœๅŠกๅฏๅŠจ็š„็คบไพ‹๏ผš
857
-
858
- ```shell
859
- lmdeploy serve api_server OpenGVLab/InternVL2-40B --backend turbomind --server-port 23333
860
- ```
861
-
862
- ไธบไบ†ไฝฟ็”จOpenAI้ฃŽๆ ผ็š„APIๆŽฅๅฃ๏ผŒๆ‚จ้œ€่ฆๅฎ‰่ฃ…OpenAI:
863
-
864
- ```shell
865
- pip install openai
866
- ```
867
-
868
- ็„ถๅŽ๏ผŒไฝฟ็”จไธ‹้ข็š„ไปฃ็ ่ฟ›่กŒAPI่ฐƒ็”จ:
869
-
870
- ```python
871
- from openai import OpenAI
872
-
873
- client = OpenAI(api_key='YOUR_API_KEY', base_url='http://0.0.0.0:23333/v1')
874
- model_name = client.models.list().data[0].id
875
- response = client.chat.completions.create(
876
- model=model_name,
877
- messages=[{
878
- 'role':
879
- 'user',
880
- 'content': [{
881
- 'type': 'text',
882
- 'text': 'describe this image',
883
- }, {
884
- 'type': 'image_url',
885
- 'image_url': {
886
- 'url':
887
- 'https://modelscope.oss-cn-beijing.aliyuncs.com/resource/tiger.jpeg',
888
- },
889
- }],
890
- }],
891
- temperature=0.8,
892
- top_p=0.8)
893
- print(response)
894
- ```
895
-
896
- ## ๅผ€ๆบ่ฎธๅฏ่ฏ
897
-
898
- ่ฏฅ้กน็›ฎ้‡‡็”จ MIT ่ฎธๅฏ่ฏๅ‘ๅธƒ๏ผŒ่€Œ InternLM2 ๅˆ™้‡‡็”จ Apache-2.0 ่ฎธๅฏ่ฏใ€‚
899
-
900
- ## ๅผ•็”จ
901
-
902
- ๅฆ‚ๆžœๆ‚จๅ‘็Žฐๆญค้กน็›ฎๅฏนๆ‚จ็š„็ ”็ฉถๆœ‰็”จ๏ผŒๅฏไปฅ่€ƒ่™‘ๅผ•็”จๆˆ‘ไปฌ็š„่ฎบๆ–‡๏ผš
903
-
904
- ```BibTeX
905
  @article{chen2023internvl,
906
  title={InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks},
907
  author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and Li, Bin and Luo, Ping and Lu, Tong and Qiao, Yu and Dai, Jifeng},
 
5
  base_model:
6
  - OpenGVLab/InternViT-6B-448px-V1-5
7
  - NousResearch/Nous-Hermes-2-Yi-34B
8
+ new_version: OpenGVLab/InternVL2_5-38B
9
  base_model_relation: merge
10
  language:
11
  - multilingual
 
20
 
21
  # InternVL2-40B
22
 
23
+ [\[๐Ÿ“‚ GitHub\]](https://github.com/OpenGVLab/InternVL) [\[๐Ÿ†• Blog\]](https://internvl.github.io/blog/) [\[๐Ÿ“œ InternVL 1.0\]](https://arxiv.org/abs/2312.14238) [\[๐Ÿ“œ InternVL 1.5\]](https://arxiv.org/abs/2404.16821) [\[๐Ÿ“œ Mini-InternVL\]](https://arxiv.org/abs/2410.16261)
24
 
25
  [\[๐Ÿ—จ๏ธ Chat Demo\]](https://internvl.opengvlab.com/) [\[๐Ÿค— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[๐Ÿš€ Quick Start\]](#quick-start) [\[๐Ÿ“– ไธญๆ–‡่งฃ่ฏป\]](https://zhuanlan.zhihu.com/p/706547971) [\[๐Ÿ“– Documents\]](https://internvl.readthedocs.io/en/latest/)
26
 
27
+ <div align="center">
28
+ <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
29
+ </div>
30
 
31
  ## Introduction
32
 
 
66
  | MME<sub>sum</sub> | 2070.2 | 2110.6 | 2260.7 | 2315.0 |
67
  | RealWorldQA | 68.0 | 67.5 | 68.3 | 71.8 |
68
  | AI2D<sub>test</sub> | 89.4 | 80.3 | 84.5 | 87.1 |
69
+ | MMMU<sub>val</sub> | 63.1 | 58.5 | 51.2 | 55.2 |
70
  | MMBench-EN<sub>test</sub> | 81.0 | 73.9 | 83.4 | 86.8 |
71
  | MMBench-CN<sub>test</sub> | 80.2 | 73.8 | 82.0 | 86.5 |
72
  | CCBench<sub>dev</sub> | 57.3 | 28.4 | 73.5 | 80.6 |
 
79
 
80
  - For more details and evaluation reproduction, please refer to our [Evaluation Guide](https://internvl.readthedocs.io/en/latest/internvl2.0/evaluation.html).
81
 
82
+ - We simultaneously use [InternVL](https://github.com/OpenGVLab/InternVL) and [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) repositories for model evaluation. Specifically, the results reported for DocVQA, ChartQA, InfoVQA, TextVQA, MME, AI2D, MMBench, CCBench, MMVet (GPT-4-0613), and SEED-Image were tested using the InternVL repository. MMMU, OCRBench, RealWorldQA, HallBench, MMVet (GPT-4-Turbo), and MathVista were evaluated using the VLMEvalKit.
 
 
83
 
84
  - Please note that evaluating the same model using different testing toolkits like [InternVL](https://github.com/OpenGVLab/InternVL) and [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) can result in slight differences, which is normal. Updates to code versions and variations in environment and hardware can also cause minor discrepancies in results.
85
 
 
129
 
130
  We also welcome you to experience the InternVL2 series models in our [online demo](https://internvl.opengvlab.com/).
131
 
132
+ > Please use transformers>=4.37.2 to ensure the model works normally.
133
 
134
  ### Model Loading
135
 
 
461
  print(f'User: {question}\nAssistant: {response}')
462
  ```
463
 
464
+ #### Streaming Output
465
 
466
  Besides this method, you can also use the following code to get streamed output.
467
 
 
501
  LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
502
 
503
  ```sh
504
+ pip install lmdeploy>=0.5.3
505
  ```
506
 
507
  LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference pipeline.
508
 
509
+ #### A 'Hello, world' Example
510
 
511
  ```python
512
  from lmdeploy import pipeline, TurbomindEngineConfig
 
521
 
522
  If `ImportError` occurs while executing this case, please install the required dependency packages as prompted.
523
 
524
+ #### Multi-images Inference
525
 
526
  When dealing with multiple images, you can put them all in one list. Keep in mind that multiple images will lead to a higher number of input tokens, and as a result, the size of the context window typically needs to be increased.
527
 
 
546
  print(response.text)
547
  ```
548
 
549
+ #### Batch Prompts Inference
550
 
551
  Conducting inference with batch prompts is quite straightforward; just place them within a list structure:
552
 
 
566
  print(response)
567
  ```
568
 
569
+ #### Multi-turn Conversation
570
 
571
  There are two ways to do the multi-turn conversations with the pipeline. One is to construct messages according to the format of OpenAI and use above introduced method, the other is to use the `pipeline.chat` interface.
572
 
 
636
  If you find this project useful in your research, please consider citing:
637
 
638
  ```BibTeX
639
+ @article{gao2024mini,
640
+ title={Mini-internvl: A flexible-transfer pocket multimodal model with 5\% parameters and 90\% performance},
641
+ author={Gao, Zhangwei and Chen, Zhe and Cui, Erfei and Ren, Yiming and Wang, Weiyun and Zhu, Jinguo and Tian, Hao and Ye, Shenglong and He, Junjun and Zhu, Xizhou and others},
642
+ journal={arXiv preprint arXiv:2410.16261},
 
 
 
 
 
 
643
  year={2024}
644
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
645
  @article{chen2023internvl,
646
  title={InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks},
647
  author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and Li, Bin and Luo, Ping and Lu, Tong and Qiao, Yu and Dai, Jifeng},
config.json CHANGED
@@ -17,6 +17,7 @@
17
  "architectures": [
18
  "LlamaForCausalLM"
19
  ],
 
20
  "attention_bias": false,
21
  "attention_dropout": 0.0,
22
  "bad_words_ids": null,
 
17
  "architectures": [
18
  "LlamaForCausalLM"
19
  ],
20
+ "_attn_implementation": "flash_attention_2",
21
  "attention_bias": false,
22
  "attention_dropout": 0.0,
23
  "bad_words_ids": null,
configuration_internvl_chat.py CHANGED
@@ -38,11 +38,11 @@ class InternVLChatConfig(PretrainedConfig):
38
  super().__init__(**kwargs)
39
 
40
  if vision_config is None:
41
- vision_config = {}
42
  logger.info('vision_config is None. Initializing the InternVisionConfig with default values.')
43
 
44
  if llm_config is None:
45
- llm_config = {}
46
  logger.info('llm_config is None. Initializing the LlamaConfig config with default values (`LlamaConfig`).')
47
 
48
  self.vision_config = InternVisionConfig(**vision_config)
 
38
  super().__init__(**kwargs)
39
 
40
  if vision_config is None:
41
+ vision_config = {'architectures': ['InternVisionModel']}
42
  logger.info('vision_config is None. Initializing the InternVisionConfig with default values.')
43
 
44
  if llm_config is None:
45
+ llm_config = {'architectures': ['LlamaForCausalLM']}
46
  logger.info('llm_config is None. Initializing the LlamaConfig config with default values (`LlamaConfig`).')
47
 
48
  self.vision_config = InternVisionConfig(**vision_config)
modeling_intern_vit.py CHANGED
@@ -3,6 +3,7 @@
3
  # Copyright (c) 2024 OpenGVLab
4
  # Licensed under The MIT License [see LICENSE for details]
5
  # --------------------------------------------------------
 
6
  from typing import Optional, Tuple, Union
7
 
8
  import torch
 
3
  # Copyright (c) 2024 OpenGVLab
4
  # Licensed under The MIT License [see LICENSE for details]
5
  # --------------------------------------------------------
6
+
7
  from typing import Optional, Tuple, Union
8
 
9
  import torch