OpenGVLab/HoVLE · ValueError in Attention Mask Size When Running Without Flash-Attn

Thank you for your excellent work! I’ve been using this model with great interest but encountered an issue when running it on a V100 GPU, which doesn’t support flash-attn. To adapt to V100, I've changed "attn_implementation" from flash_attention_2 to eager. But when attempting a plain text conversation, I encountered the following error in InternLM2FlashAttention2.forward: ValueError: Attention mask should be of size (1, 1, 1, 52), but is torch.Size([1, 1, 1, 1]).
Here are the details:

My experimental setup

- `transformers` version: 4.37.2
- Platform: Linux-4.15.0-142-generic-x86_64-with-glibc2.23
- Python version: 3.9.21
- Huggingface_hub version: 0.29.3
- Safetensors version: 0.5.3
- Accelerate version: 1.5.2
- Accelerate config:    not found
- PyTorch version (GPU?): 2.5.1+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No

Modifications Made

Since the V100 GPU doesn’t support flash-attn, I made the following changes to the configuration files:

In config.json:

Set use_flash_attn to false;
Changed attn_implementation to eager in both the embedding_config and the llm_config sections.

In configuration_holistic_embedding.py:

Changed attn_implementation to eager.

Error Traceback

Here’s the detailed error traceback:

Traceback (most recent call last):
    File "/data4/cyz/mycode/hovle-run/run.py", line 104, in <module>
      response, history = model.chat(tokenizer, None, question, generation_config, history=None, return_history=True)
    File "/home/cyz/.cache/huggingface/modules/transformers_modules/modeling_internvl_chat.py", line 379, in chat
      generation_output = super().generate(
    File "/data4/cyz/hovle-env/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
      return func(*args, **kwargs)
    File "/data4/cyz/hovle-env/lib/python3.9/site-packages/transformers/generation/utils.py", line 1525, in generate
      return self.sample(
    File "/data4/cyz/hovle-env/lib/python3.9/site-packages/transformers/generation/utils.py", line 2622, in sample
      outputs = self(
    File "/data4/cyz/hovle-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/data4/cyz/hovle-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
      return forward_call(*args, **kwargs)
    File "/home/cyz/.cache/huggingface/modules/transformers_modules/modeling_internvl_chat.py", line 202, in forward
      outputs = layer_module(
    File "/data4/cyz/hovle-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/data4/cyz/hovle-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
      return forward_call(*args, **kwargs)
    File "/home/cyz/.cache/huggingface/modules/transformers_modules/modeling_holistic_embedding.py", line 666, in forward
      hidden_states, self_attn_weights, present_key_value = self.attention(
    File "/data4/cyz/hovle-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/data4/cyz/hovle-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
      return forward_call(*args, **kwargs)
    File "/home/cyz/.cache/huggingface/modules/transformers_modules/modeling_holistic_embedding.py", line 400, in forward
      raise ValueError(
  ValueError: Attention mask should be of size (1, 1, 1, 52), but is torch.Size([1, 1, 1, 1])

I’m currently trying to debug this issue, but I’m not entirely sure how to proceed. Could you please provide some guidance on how to resolve this error? Any suggestions or insights would be greatly appreciated!

Thank you in advance for your time and support!