Padding of labels bug?
#44
by
haukurpj
- opened
I'm currently reimplementing the audio training sample code using Pytorch Lightning and while debugging an issue I noticed in the collator:
labels = pad_sequence(labels_list, padding_side='left', padding_value=0)
When batching, should the labels not be padded with _IGNORE_INDEX
?
I think the attention mask will handle it.
I think it does matter when calculating the loss but the HF trainer is probably handling this case, i.e. converting pad to -100 before the loss calculation.