ValueError in Attention Mask Size When Running Without Flash-Attn
Thank you for your excellent work! I’ve been using this model with great interest but encountered an issue when running it on a V100 GPU, which doesn’t support flash-attn. To adapt to V100, I've changed "attn_implementation" from flash_attention_2
to eager
. But when attempting a plain text conversation, I encountered the following error in InternLM2FlashAttention2.forward: ValueError: Attention mask should be of size (1, 1, 1, 52), but is torch.Size([1, 1, 1, 1])
.
Here are the details:
My experimental setup
- `transformers` version: 4.37.2
- Platform: Linux-4.15.0-142-generic-x86_64-with-glibc2.23
- Python version: 3.9.21
- Huggingface_hub version: 0.29.3
- Safetensors version: 0.5.3
- Accelerate version: 1.5.2
- Accelerate config: not found
- PyTorch version (GPU?): 2.5.1+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Modifications Made
Since the V100 GPU doesn’t support flash-attn, I made the following changes to the configuration files:
- In config.json:
- Set
use_flash_attn
tofalse
; - Changed
attn_implementation
toeager
in both theembedding_config
and thellm_config
sections.
- In configuration_holistic_embedding.py:
- Changed
attn_implementation
toeager
.
Error Traceback
Here’s the detailed error traceback:
Traceback (most recent call last):
File "/data4/cyz/mycode/hovle-run/run.py", line 104, in <module>
response, history = model.chat(tokenizer, None, question, generation_config, history=None, return_history=True)
File "/home/cyz/.cache/huggingface/modules/transformers_modules/modeling_internvl_chat.py", line 379, in chat
generation_output = super().generate(
File "/data4/cyz/hovle-env/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/data4/cyz/hovle-env/lib/python3.9/site-packages/transformers/generation/utils.py", line 1525, in generate
return self.sample(
File "/data4/cyz/hovle-env/lib/python3.9/site-packages/transformers/generation/utils.py", line 2622, in sample
outputs = self(
File "/data4/cyz/hovle-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data4/cyz/hovle-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cyz/.cache/huggingface/modules/transformers_modules/modeling_internvl_chat.py", line 202, in forward
outputs = layer_module(
File "/data4/cyz/hovle-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data4/cyz/hovle-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cyz/.cache/huggingface/modules/transformers_modules/modeling_holistic_embedding.py", line 666, in forward
hidden_states, self_attn_weights, present_key_value = self.attention(
File "/data4/cyz/hovle-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data4/cyz/hovle-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cyz/.cache/huggingface/modules/transformers_modules/modeling_holistic_embedding.py", line 400, in forward
raise ValueError(
ValueError: Attention mask should be of size (1, 1, 1, 52), but is torch.Size([1, 1, 1, 1])
I’m currently trying to debug this issue, but I’m not entirely sure how to proceed. Could you please provide some guidance on how to resolve this error? Any suggestions or insights would be greatly appreciated!
Thank you in advance for your time and support!