When will gradient checkpointing be implemented?
#68
by
rishiraj
- opened
Please give an idea on when can we expect gradient checkpointing to be implemented? Without it, it becomes very hard to finetune it.
Also Flash Attention 2!
And I wonder why the code has checkpointing
elements, while support_gradient_checkpoint
remaines False
+1
+1
+1
it seems to be implemented within Axolotl by Winglian on Github, not sure if it can be reused as is here.
https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/src/axolotl/models/phi/modeling_phi.py
Using this library for FT & RL of Phi-2
Hello everyone!
We have an ongoing PR in https://github.com/huggingface/transformers/pull/28163 which will solve this issue.
Regards,
Gustavo.
gugarosa
changed discussion status to
closed