Does your fine-tuning process overfit?
#15
by
jiaxiangc
- opened
Thanks for your contribution.
After I fine tuning LLaMA-13B on OpenOrca or SlimOrca, I want to ask two questions.
- What is your training configuration? Such as, GPU numers, learning rate, fine tune strategy and epoch numbers.
- Does your fine-tuning process overfit? When i start the second epoch, the training loss dropped significantly. Is this normal? Do you have any suggestions to avoid this problem?
For the compute config, it is 8x a6000 gpus, rented from runpod.io. To prevent overfitting we use packing, which also will speed up training a considerable amount. As far as that trainer we use, it is called axolotl, and you can find it here https://github.com/OpenAccess-AI-Collective/axolotl. for learning rate and all other config options, in the configs folder on each model there is a yaml file which details all the options which axolotl uses.
Hope that helps!
Thanks for storing the axolotl config! I suggest you add this to the model card so that people know where to find it :] just my 2c