Upload folder using huggingface_hub
Browse files- README.md +253 -13
- modeling_intern_vit.py +6 -13
README.md
CHANGED
@@ -62,6 +62,8 @@ InternVL 2.0 is a multimodal large language model series, featuring models of va
|
|
62 |
| MathVista<sub>testmini</sub> | 28.7 | 41.1 | 46.3 | 37.7 |
|
63 |
| OpenCompass<sub>avg</sub> | 46.6 | 49.8 | 54.0 | 48.3 |
|
64 |
|
|
|
|
|
65 |
- We simultaneously use InternVL and VLMEvalKit repositories for model evaluation. Specifically, the results reported for DocVQA, ChartQA, InfoVQA, TextVQA, MME, AI2D, MMBench, CCBench, MMVet, and SEED-Image were tested using the InternVL repository. OCRBench, RealWorldQA, HallBench, and MathVista were evaluated using the VLMEvalKit.
|
66 |
|
67 |
- For MMMU, we report both the original scores (left side: evaluated using the InternVL codebase for InternVL series models, and sourced from technical reports or webpages for other models) and the VLMEvalKit scores (right side: collected from the OpenCompass leaderboard).
|
@@ -300,7 +302,7 @@ tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast
|
|
300 |
|
301 |
# set the max number of tiles in `max_num`
|
302 |
pixel_values = load_image('./examples/image1.jpg', max_num=12).to(torch.bfloat16).cuda()
|
303 |
-
generation_config = dict(max_new_tokens=1024, do_sample=
|
304 |
|
305 |
# pure-text conversation (纯文本对话)
|
306 |
question = 'Hello, who are you?'
|
@@ -452,21 +454,140 @@ for new_text in streamer:
|
|
452 |
|
453 |
## Finetune
|
454 |
|
455 |
-
|
456 |
|
457 |
## Deployment
|
458 |
|
459 |
### LMDeploy
|
460 |
|
461 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
462 |
|
463 |
-
|
464 |
|
465 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
466 |
|
467 |
-
|
468 |
|
469 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
470 |
|
471 |
## License
|
472 |
|
@@ -540,6 +661,8 @@ InternVL 2.0 是一个多模态大语言模型系列,包含各种规模的模
|
|
540 |
| MathVista<sub>testmini</sub> | 28.7 | 41.1 | 46.3 | 37.7 |
|
541 |
| OpenCompass<sub>avg</sub> | 46.6 | 49.8 | 54.0 | 48.3 |
|
542 |
|
|
|
|
|
543 |
- 我们同时使用 InternVL 和 VLMEvalKit 仓库进行模型评估。具体来说,DocVQA、ChartQA、InfoVQA、TextVQA、MME、AI2D、MMBench、CCBench、MMVet 和 SEED-Image 的结果是使用 InternVL 仓库测试的。OCRBench、RealWorldQA、HallBench 和 MathVista 是使用 VLMEvalKit 进行评估的。
|
544 |
|
545 |
- 对于MMMU,我们报告了原始分数(左侧:InternVL系列模型使用InternVL代码库评测,其他模型的分数来自其技术报告或网页)和VLMEvalKit分数(右侧:从OpenCompass排行榜收集)。
|
@@ -598,21 +721,138 @@ InternVL 2.0 是一个多模态大语言模型系列,包含各种规模的模
|
|
598 |
|
599 |
## 微调
|
600 |
|
601 |
-
|
602 |
|
603 |
## 部署
|
604 |
|
605 |
### LMDeploy
|
606 |
|
607 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
608 |
|
609 |
-
|
610 |
|
611 |
-
|
612 |
|
613 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
614 |
|
615 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
616 |
|
617 |
## 开源许可证
|
618 |
|
|
|
62 |
| MathVista<sub>testmini</sub> | 28.7 | 41.1 | 46.3 | 37.7 |
|
63 |
| OpenCompass<sub>avg</sub> | 46.6 | 49.8 | 54.0 | 48.3 |
|
64 |
|
65 |
+
- For more details and evaluation reproduction, please refer to our [Evaluation Guide](https://internvl.readthedocs.io/en/latest/internvl2.0/evaluation.html).
|
66 |
+
|
67 |
- We simultaneously use InternVL and VLMEvalKit repositories for model evaluation. Specifically, the results reported for DocVQA, ChartQA, InfoVQA, TextVQA, MME, AI2D, MMBench, CCBench, MMVet, and SEED-Image were tested using the InternVL repository. OCRBench, RealWorldQA, HallBench, and MathVista were evaluated using the VLMEvalKit.
|
68 |
|
69 |
- For MMMU, we report both the original scores (left side: evaluated using the InternVL codebase for InternVL series models, and sourced from technical reports or webpages for other models) and the VLMEvalKit scores (right side: collected from the OpenCompass leaderboard).
|
|
|
302 |
|
303 |
# set the max number of tiles in `max_num`
|
304 |
pixel_values = load_image('./examples/image1.jpg', max_num=12).to(torch.bfloat16).cuda()
|
305 |
+
generation_config = dict(max_new_tokens=1024, do_sample=True)
|
306 |
|
307 |
# pure-text conversation (纯文本对话)
|
308 |
question = 'Hello, who are you?'
|
|
|
454 |
|
455 |
## Finetune
|
456 |
|
457 |
+
Many repositories now support fine-tuning of the InternVL series models, including [InternVL](https://github.com/OpenGVLab/InternVL), [SWIFT](https://github.com/modelscope/ms-swift), [XTurner](https://github.com/InternLM/xtuner), and others. Please refer to their documentation for more details on fine-tuning.
|
458 |
|
459 |
## Deployment
|
460 |
|
461 |
### LMDeploy
|
462 |
|
463 |
+
LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
|
464 |
+
|
465 |
+
```sh
|
466 |
+
pip install lmdeploy==0.5.3
|
467 |
+
```
|
468 |
+
|
469 |
+
LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference pipeline.
|
470 |
+
|
471 |
+
#### A 'Hello, world' example
|
472 |
+
|
473 |
+
```python
|
474 |
+
from lmdeploy import pipeline, TurbomindEngineConfig
|
475 |
+
from lmdeploy.vl import load_image
|
476 |
+
|
477 |
+
model = 'OpenGVLab/InternVL2-1B'
|
478 |
+
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
|
479 |
+
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
|
480 |
+
response = pipe(('describe this image', image))
|
481 |
+
print(response.text)
|
482 |
+
```
|
483 |
+
|
484 |
+
If `ImportError` occurs while executing this case, please install the required dependency packages as prompted.
|
485 |
+
|
486 |
+
#### Multi-images inference
|
487 |
+
|
488 |
+
When dealing with multiple images, you can put them all in one list. Keep in mind that multiple images will lead to a higher number of input tokens, and as a result, the size of the context window typically needs to be increased.
|
489 |
+
|
490 |
+
> Warning: Due to the scarcity of multi-image conversation data, the performance on multi-image tasks may be unstable, and it may require multiple attempts to achieve satisfactory results.
|
491 |
+
|
492 |
+
```python
|
493 |
+
from lmdeploy import pipeline, TurbomindEngineConfig
|
494 |
+
from lmdeploy.vl import load_image
|
495 |
+
from lmdeploy.vl.constants import IMAGE_TOKEN
|
496 |
+
|
497 |
+
model = 'OpenGVLab/InternVL2-1B'
|
498 |
+
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
|
499 |
+
|
500 |
+
image_urls=[
|
501 |
+
'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg',
|
502 |
+
'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/det.jpg'
|
503 |
+
]
|
504 |
+
|
505 |
+
images = [load_image(img_url) for img_url in image_urls]
|
506 |
+
# Numbering images improves multi-image conversations
|
507 |
+
response = pipe((f'Image-1: {IMAGE_TOKEN}\nImage-2: {IMAGE_TOKEN}\ndescribe these two images', images))
|
508 |
+
print(response.text)
|
509 |
+
```
|
510 |
+
|
511 |
+
#### Batch prompts inference
|
512 |
|
513 |
+
Conducting inference with batch prompts is quite straightforward; just place them within a list structure:
|
514 |
|
515 |
+
```python
|
516 |
+
from lmdeploy import pipeline, TurbomindEngineConfig
|
517 |
+
from lmdeploy.vl import load_image
|
518 |
+
|
519 |
+
model = 'OpenGVLab/InternVL2-1B'
|
520 |
+
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
|
521 |
+
|
522 |
+
image_urls=[
|
523 |
+
"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg",
|
524 |
+
"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/det.jpg"
|
525 |
+
]
|
526 |
+
prompts = [('describe this image', load_image(img_url)) for img_url in image_urls]
|
527 |
+
response = pipe(prompts)
|
528 |
+
print(response)
|
529 |
+
```
|
530 |
+
|
531 |
+
#### Multi-turn conversation
|
532 |
+
|
533 |
+
There are two ways to do the multi-turn conversations with the pipeline. One is to construct messages according to the format of OpenAI and use above introduced method, the other is to use the `pipeline.chat` interface.
|
534 |
+
|
535 |
+
```python
|
536 |
+
from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
|
537 |
+
from lmdeploy.vl import load_image
|
538 |
+
|
539 |
+
model = 'OpenGVLab/InternVL2-1B'
|
540 |
+
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
|
541 |
+
|
542 |
+
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')
|
543 |
+
gen_config = GenerationConfig(top_k=40, top_p=0.8, temperature=0.8)
|
544 |
+
sess = pipe.chat(('describe this image', image), gen_config=gen_config)
|
545 |
+
print(sess.response.text)
|
546 |
+
sess = pipe.chat('What is the woman doing?', session=sess, gen_config=gen_config)
|
547 |
+
print(sess.response.text)
|
548 |
+
```
|
549 |
+
|
550 |
+
#### Service
|
551 |
|
552 |
+
LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
|
553 |
|
554 |
+
```shell
|
555 |
+
lmdeploy serve api_server OpenGVLab/InternVL2-1B --backend turbomind --server-port 23333
|
556 |
+
```
|
557 |
+
|
558 |
+
To use the OpenAI-style interface, you need to install OpenAI:
|
559 |
+
|
560 |
+
```shell
|
561 |
+
pip install openai
|
562 |
+
```
|
563 |
+
|
564 |
+
Then, use the code below to make the API call:
|
565 |
+
|
566 |
+
```python
|
567 |
+
from openai import OpenAI
|
568 |
+
|
569 |
+
client = OpenAI(api_key='YOUR_API_KEY', base_url='http://0.0.0.0:23333/v1')
|
570 |
+
model_name = client.models.list().data[0].id
|
571 |
+
response = client.chat.completions.create(
|
572 |
+
model=model_name,
|
573 |
+
messages=[{
|
574 |
+
'role':
|
575 |
+
'user',
|
576 |
+
'content': [{
|
577 |
+
'type': 'text',
|
578 |
+
'text': 'describe this image',
|
579 |
+
}, {
|
580 |
+
'type': 'image_url',
|
581 |
+
'image_url': {
|
582 |
+
'url':
|
583 |
+
'https://modelscope.oss-cn-beijing.aliyuncs.com/resource/tiger.jpeg',
|
584 |
+
},
|
585 |
+
}],
|
586 |
+
}],
|
587 |
+
temperature=0.8,
|
588 |
+
top_p=0.8)
|
589 |
+
print(response)
|
590 |
+
```
|
591 |
|
592 |
## License
|
593 |
|
|
|
661 |
| MathVista<sub>testmini</sub> | 28.7 | 41.1 | 46.3 | 37.7 |
|
662 |
| OpenCompass<sub>avg</sub> | 46.6 | 49.8 | 54.0 | 48.3 |
|
663 |
|
664 |
+
- 关于更多的细节以及评测复现,请看我们的[评测指南](https://internvl.readthedocs.io/en/latest/internvl2.0/evaluation.html)。
|
665 |
+
|
666 |
- 我们同时使用 InternVL 和 VLMEvalKit 仓库进行模型评估。具体来说,DocVQA、ChartQA、InfoVQA、TextVQA、MME、AI2D、MMBench、CCBench、MMVet 和 SEED-Image 的结果是使用 InternVL 仓库测试的。OCRBench、RealWorldQA、HallBench 和 MathVista 是使用 VLMEvalKit 进行评估的。
|
667 |
|
668 |
- 对于MMMU,我们报告了原始分数(左侧:InternVL系列模型使用InternVL代码库评测,其他模型的分数来自其技术报告或网页)和VLMEvalKit分数(右侧:从OpenCompass排行榜收集)。
|
|
|
721 |
|
722 |
## 微调
|
723 |
|
724 |
+
许多仓库现在都支持 InternVL 系列模型的微调,包括 [InternVL](https://github.com/OpenGVLab/InternVL)、[SWIFT](https://github.com/modelscope/ms-swift)、[XTurner](https://github.com/InternLM/xtuner) 等。请参阅它们的文档以获取更多微调细节。
|
725 |
|
726 |
## 部署
|
727 |
|
728 |
### LMDeploy
|
729 |
|
730 |
+
LMDeploy 是由 MMRazor 和 MMDeploy 团队开发的用于压缩、部署和服务大语言模型(LLM)的工具包。
|
731 |
+
|
732 |
+
```sh
|
733 |
+
pip install lmdeploy==0.5.3
|
734 |
+
```
|
735 |
+
|
736 |
+
LMDeploy 将多模态视觉-语言模型(VLM)的复杂推理过程抽象为一个易于使用的管道,类似于大语言模型(LLM)的推理管道。
|
737 |
+
|
738 |
+
#### 一个“你好,世界”示例
|
739 |
+
|
740 |
+
```python
|
741 |
+
from lmdeploy import pipeline, TurbomindEngineConfig
|
742 |
+
from lmdeploy.vl import load_image
|
743 |
+
|
744 |
+
model = 'OpenGVLab/InternVL2-1B'
|
745 |
+
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
|
746 |
+
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
|
747 |
+
response = pipe(('describe this image', image))
|
748 |
+
print(response.text)
|
749 |
+
```
|
750 |
+
|
751 |
+
如果在执行此示例时出现 `ImportError`,请按照提示安装所需的依赖包。
|
752 |
+
|
753 |
+
#### 多图像推理
|
754 |
+
|
755 |
+
在处理多张图像时,可以将它们全部放入一个列表中。请注意,多张图像会导致输入 token 数量增加,因此通常需要增加上下文窗口的大小。
|
756 |
+
|
757 |
+
```python
|
758 |
+
from lmdeploy import pipeline, TurbomindEngineConfig
|
759 |
+
from lmdeploy.vl import load_image
|
760 |
+
from lmdeploy.vl.constants import IMAGE_TOKEN
|
761 |
+
|
762 |
+
model = 'OpenGVLab/InternVL2-1B'
|
763 |
+
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
|
764 |
+
|
765 |
+
image_urls=[
|
766 |
+
'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg',
|
767 |
+
'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/det.jpg'
|
768 |
+
]
|
769 |
+
|
770 |
+
images = [load_image(img_url) for img_url in image_urls]
|
771 |
+
# Numbering images improves multi-image conversations
|
772 |
+
response = pipe((f'Image-1: {IMAGE_TOKEN}\nImage-2: {IMAGE_TOKEN}\ndescribe these two images', images))
|
773 |
+
print(response.text)
|
774 |
+
```
|
775 |
|
776 |
+
#### 批量Prompt推理
|
777 |
|
778 |
+
使用批量Prompt进行推理非常简单;只需将它们放在一个列表结构中:
|
779 |
|
780 |
+
```python
|
781 |
+
from lmdeploy import pipeline, TurbomindEngineConfig
|
782 |
+
from lmdeploy.vl import load_image
|
783 |
+
|
784 |
+
model = 'OpenGVLab/InternVL2-1B'
|
785 |
+
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
|
786 |
+
|
787 |
+
image_urls=[
|
788 |
+
"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg",
|
789 |
+
"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/det.jpg"
|
790 |
+
]
|
791 |
+
prompts = [('describe this image', load_image(img_url)) for img_url in image_urls]
|
792 |
+
response = pipe(prompts)
|
793 |
+
print(response)
|
794 |
+
```
|
795 |
|
796 |
+
#### 多轮对话
|
797 |
+
|
798 |
+
使用管道进行多轮对话有两种方法。一种是根据 OpenAI 的格式构建消息并使用上述方法,另一种是使用 `pipeline.chat` 接口。
|
799 |
+
|
800 |
+
```python
|
801 |
+
from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
|
802 |
+
from lmdeploy.vl import load_image
|
803 |
+
|
804 |
+
model = 'OpenGVLab/InternVL2-1B'
|
805 |
+
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
|
806 |
+
|
807 |
+
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')
|
808 |
+
gen_config = GenerationConfig(top_k=40, top_p=0.8, temperature=0.8)
|
809 |
+
sess = pipe.chat(('describe this image', image), gen_config=gen_config)
|
810 |
+
print(sess.response.text)
|
811 |
+
sess = pipe.chat('What is the woman doing?', session=sess, gen_config=gen_config)
|
812 |
+
print(sess.response.text)
|
813 |
+
```
|
814 |
+
|
815 |
+
#### API部署
|
816 |
+
|
817 |
+
LMDeploy 的 `api_server` 使模型能够通过一个命令轻松打包成服务。提供的 RESTful API 与 OpenAI 的接口兼容。以下是服务启动的示例:
|
818 |
+
|
819 |
+
```shell
|
820 |
+
lmdeploy serve api_server OpenGVLab/InternVL2-1B --backend turbomind --server-port 23333
|
821 |
+
```
|
822 |
+
|
823 |
+
为了使用OpenAI风格的API接口,您需要安装OpenAI:
|
824 |
+
|
825 |
+
```shell
|
826 |
+
pip install openai
|
827 |
+
```
|
828 |
+
|
829 |
+
然后,使用下面的代码进行API调用:
|
830 |
+
|
831 |
+
```python
|
832 |
+
from openai import OpenAI
|
833 |
+
|
834 |
+
client = OpenAI(api_key='YOUR_API_KEY', base_url='http://0.0.0.0:23333/v1')
|
835 |
+
model_name = client.models.list().data[0].id
|
836 |
+
response = client.chat.completions.create(
|
837 |
+
model=model_name,
|
838 |
+
messages=[{
|
839 |
+
'role':
|
840 |
+
'user',
|
841 |
+
'content': [{
|
842 |
+
'type': 'text',
|
843 |
+
'text': 'describe this image',
|
844 |
+
}, {
|
845 |
+
'type': 'image_url',
|
846 |
+
'image_url': {
|
847 |
+
'url':
|
848 |
+
'https://modelscope.oss-cn-beijing.aliyuncs.com/resource/tiger.jpeg',
|
849 |
+
},
|
850 |
+
}],
|
851 |
+
}],
|
852 |
+
temperature=0.8,
|
853 |
+
top_p=0.8)
|
854 |
+
print(response)
|
855 |
+
```
|
856 |
|
857 |
## 开源许可证
|
858 |
|
modeling_intern_vit.py
CHANGED
@@ -15,24 +15,17 @@ from transformers.activations import ACT2FN
|
|
15 |
from transformers.modeling_outputs import (BaseModelOutput,
|
16 |
BaseModelOutputWithPooling)
|
17 |
from transformers.modeling_utils import PreTrainedModel
|
18 |
-
from transformers.utils.import_utils import is_flash_attn_greater_or_equal
|
19 |
from transformers.utils import logging
|
20 |
|
21 |
from .configuration_intern_vit import InternVisionConfig
|
22 |
|
23 |
try:
|
24 |
-
if is_flash_attn_greater_or_equal("2.0.0"):
|
25 |
-
from flash_attn.flash_attn_interface import \
|
26 |
-
flash_attn_varlen_qkvpacked_func as flash_attn_unpadded_qkvpacked_func
|
27 |
-
else:
|
28 |
-
from flash_attn.flash_attn_interface import \
|
29 |
-
flash_attn_unpadded_qkvpacked_func
|
30 |
-
|
31 |
from flash_attn.bert_padding import pad_input, unpad_input
|
32 |
-
|
|
|
33 |
has_flash_attn = True
|
34 |
except:
|
35 |
-
print('
|
36 |
has_flash_attn = False
|
37 |
|
38 |
logger = logging.get_logger(__name__)
|
@@ -75,7 +68,7 @@ class FlashAttention(nn.Module):
|
|
75 |
max_s = seqlen
|
76 |
cu_seqlens = torch.arange(0, (batch_size + 1) * seqlen, step=seqlen, dtype=torch.int32,
|
77 |
device=qkv.device)
|
78 |
-
output =
|
79 |
qkv, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
|
80 |
softmax_scale=self.softmax_scale, causal=causal
|
81 |
)
|
@@ -85,7 +78,7 @@ class FlashAttention(nn.Module):
|
|
85 |
x = rearrange(qkv, 'b s three h d -> b s (three h d)')
|
86 |
x_unpad, indices, cu_seqlens, max_s = unpad_input(x, key_padding_mask)
|
87 |
x_unpad = rearrange(x_unpad, 'nnz (three h d) -> nnz three h d', three=3, h=nheads)
|
88 |
-
output_unpad =
|
89 |
x_unpad, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
|
90 |
softmax_scale=self.softmax_scale, causal=causal
|
91 |
)
|
@@ -94,7 +87,7 @@ class FlashAttention(nn.Module):
|
|
94 |
'b s (h d) -> b s h d', h=nheads)
|
95 |
else:
|
96 |
assert max_s is not None
|
97 |
-
output =
|
98 |
qkv, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
|
99 |
softmax_scale=self.softmax_scale, causal=causal
|
100 |
)
|
|
|
15 |
from transformers.modeling_outputs import (BaseModelOutput,
|
16 |
BaseModelOutputWithPooling)
|
17 |
from transformers.modeling_utils import PreTrainedModel
|
|
|
18 |
from transformers.utils import logging
|
19 |
|
20 |
from .configuration_intern_vit import InternVisionConfig
|
21 |
|
22 |
try:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
from flash_attn.bert_padding import pad_input, unpad_input
|
24 |
+
from flash_attn.flash_attn_interface import \
|
25 |
+
flash_attn_varlen_qkvpacked_func
|
26 |
has_flash_attn = True
|
27 |
except:
|
28 |
+
print('FlashAttention2 is not installed.')
|
29 |
has_flash_attn = False
|
30 |
|
31 |
logger = logging.get_logger(__name__)
|
|
|
68 |
max_s = seqlen
|
69 |
cu_seqlens = torch.arange(0, (batch_size + 1) * seqlen, step=seqlen, dtype=torch.int32,
|
70 |
device=qkv.device)
|
71 |
+
output = flash_attn_varlen_qkvpacked_func(
|
72 |
qkv, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
|
73 |
softmax_scale=self.softmax_scale, causal=causal
|
74 |
)
|
|
|
78 |
x = rearrange(qkv, 'b s three h d -> b s (three h d)')
|
79 |
x_unpad, indices, cu_seqlens, max_s = unpad_input(x, key_padding_mask)
|
80 |
x_unpad = rearrange(x_unpad, 'nnz (three h d) -> nnz three h d', three=3, h=nheads)
|
81 |
+
output_unpad = flash_attn_varlen_qkvpacked_func(
|
82 |
x_unpad, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
|
83 |
softmax_scale=self.softmax_scale, causal=causal
|
84 |
)
|
|
|
87 |
'b s (h d) -> b s h d', h=nheads)
|
88 |
else:
|
89 |
assert max_s is not None
|
90 |
+
output = flash_attn_varlen_qkvpacked_func(
|
91 |
qkv, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
|
92 |
softmax_scale=self.softmax_scale, causal=causal
|
93 |
)
|