Bug in attention map computation
#3
by
gionii
- opened
In the following line: https://huggingface.co/Synthyra/ESMplusplus_small/blob/main/modeling_esm_plusplus.py#L324, you are updating attention_mask
rather than attn_bias
which is actually used to mask attention values.
I am assuming you followed this template https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html