Why doesn't the S1/S2 text encoder use attn_mask or key_padding_mask to deal with padding tokens?

#5
by Kinfai - opened

Why doesn't the S1/S2 text encoder use attn_mask or key_padding_mask to deal with padding tokens? This seems to cause attention to be paid to the padding tokens instead of just the valid tokens.

Sign up or log in to comment