Target module question
#1
by
nicolollo
- opened
Why not targetting "q_proj", "o_proj", "k_proj", "v_proj", "linear", "Conv2d", "lm_head", "fc2" ?
Hi @nicolollo , I had to compromise on the layers I targeted because my system isn’t that powerful and would have taken a long time to train.
oh i see thanks may i ask what batch size you used ? and why the base FT instead of the base
I used a batch size of 8 for this. I tried batch size of 16 as well but it would run out of GPU memory.
As for the model selection, I chose the base FT version since it already produces great captions, so my focus was on making small quality improvements.
NikshepShetty
changed discussion status to
closed