Layers updated, which ones to target for LoRA fine-tuning?

#82

by lukasedv - opened Jan 13

Jan 13

LLama Factory and existing fine-tuning tutorials target the "Wqkv" module for LoRA fine-tuning. This was apparently removed a couple days ago - which layers should be targeted now (k_proj, q_proj, v_proj, fc1, fc2 seem to be listed in the safetensors files)?

prachi1910

Jan 14

I am trying to train using these target_modules=['q_proj','k_proj','v_proj','dense'].
maybe you can give it a try!

praveeny

Jan 15

•

edited Jan 16

I have been trying to fine-tune with all of the parameters listed in the model. The "trainable" parameters after LoRA is showing as 1.64%... And I am not seeing any training loss at all. Any luck figuring this out?

target_modules= [
'q_proj',
'k_proj',
'v_proj',
'dense',
'fc1',
'fc2',
'embed_tokens',
'lm_head'
],

trainable params: 25313280 || all params: 1546705920 || trainable%: 1.64

lukasedv

Jan 15

I am not seeing any loss with llama_factory either - not sure but something happened after the update.

iamanoob

Jan 16

I see loss when using bf16.

I am not seeing any loss with llama_factory either - not sure but something happened after the update.

gugarosa

Microsoft org Jan 17

Could you please test with the updated modeling_phi.py? We used the same fix (disabling auto-cast on Attention layer) as we had on some earlier revisions.

It should now show a loss when fine-tuning with fp16.

praveeny

Jan 18

@gugarosa Just completed training. Looking good. Thank you for the quick fix on this!

gugarosa

Microsoft org Jan 19

Please let me know if you see any more issues!

gugarosa changed discussion status to closed Jan 19

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment