Layers updated, which ones to target for LoRA fine-tuning?

#82
by lukasedv - opened

LLama Factory and existing fine-tuning tutorials target the "Wqkv" module for LoRA fine-tuning. This was apparently removed a couple days ago - which layers should be targeted now (k_proj, q_proj, v_proj, fc1, fc2 seem to be listed in the safetensors files)?

I am trying to train using these target_modules=['q_proj','k_proj','v_proj','dense'].
maybe you can give it a try!

I have been trying to fine-tune with all of the parameters listed in the model. The "trainable" parameters after LoRA is showing as 1.64%... And I am not seeing any training loss at all. Any luck figuring this out?

target_modules= [
'q_proj',
'k_proj',
'v_proj',
'dense',
'fc1',
'fc2',
'embed_tokens',
'lm_head'
],

trainable params: 25313280 || all params: 1546705920 || trainable%: 1.64

I am not seeing any loss with llama_factory either - not sure but something happened after the update.

I see loss when using bf16.

I am not seeing any loss with llama_factory either - not sure but something happened after the update.

Microsoft org

Could you please test with the updated modeling_phi.py? We used the same fix (disabling auto-cast on Attention layer) as we had on some earlier revisions.

It should now show a loss when fine-tuning with fp16.

@gugarosa Just completed training. Looking good. Thank you for the quick fix on this!

Microsoft org

Please let me know if you see any more issues!

gugarosa changed discussion status to closed

Sign up or log in to comment