Why doesn't the S1/S2 text encoder use attn_mask or key_padding_mask to deal with padding tokens? This seems to cause attention to be paid to the padding tokens instead of just the valid tokens.
· Sign up or log in to comment