Error encountered when fine-tuning
When using AutoProcessor.from_pretrained("meta-llama/Llama-3.2-11B-Vision")
to process batches for finetuning, an error occurred with the following message:
RuntimeError: The expanded size of the tensor (4128) must match the existing size (3096) at non-singleton dimension 3. Target sizes: [4, 16, 4128, 4128]. Tensor sizes: [4, 1, 3096, 3096]
The error traces show that the error comes from modeling_mllama.py
's forward
function:
attn_output = F.scaled_dot_product_attention(query, key, value, attn_mask=attention_mask)
Looks like there's something wrong with the attention mask, but it is obtained directly from the loaded processor.
Any idea what causes the error and how I can fix it?
Happy to provide more details, thanks!
Hi @yongleyuan excited to see what you are building with the models!
Could you kindly use the MllamaProcessor
class please? Kindly see the model card for reference or here.
We also provide a fine-tuning implementation here for reference.
Hi! Has this problem been solved? I occur in the same error when I try to infer a fine-tuned llama3.2-vision-11b model.
Using MllamaProcessor
instead of AutoProcessor
did resolve this issue (thanks to
@Sanyam
's suggestion), but I also identified some other bugs in my code. I am not sure if AutoProcessor
is the direct cause of this issue.