Kha Vu Chan

chankhavu
Β·

AI & ML interests

None yet

Recent Activity

Organizations

None yet

chankhavu's activity

upvoted an article 3 days ago
New activity in open-r1/README about 1 month ago
reacted to Titus-von-Koeller's post with πŸ”₯ 12 months ago
view post
Post
2006
πŸ”₯ Level up your model training w/ GaLore + Transformers for SOTA results on consumer-grade hardware!

⬇️ 82.5% less optimizer state memory footprint without performance degradation by expressing the gradient weight matrix as low rank.

πŸ‘©πŸΏβ€πŸ’» Install via pip install transformers>=4.39.0 galore-torch. #ProudlyGpuPoor

The integration of GaLore into the training of large language models (LLMs) marks a significant advancement in the field of deep learning, particularly in terms of memory efficiency and the democratization of AI research. By allowing for the training of billion-parameter models on consumer-grade hardware, reducing memory footprint in optimizer states, and leveraging advanced projection matrix techniques, GaLore opens new horizons for researchers and practitioners with limited access to high-end computational resources.

πŸ”¬ Find out more about GaLore and investigate lots of juicy technical details: https://huggingface.co/blog/galore

πŸ€— Huge thanks to everyone involved ❀️:

β€’ authors: @jiaweizhao @Kyriection @beidic Zhangyang Wang @animakumar @tydsh
β€’ community contributors: @hiyouga @mdouglas and others!
β€’ @ybelkada for taking such swift action in composing and coordinating necessary PRs to get this live at ⚑ speed!

πŸ—οΈπŸ“ˆ Super rewarding to see how @timdettmers work with optimizers is being built upon to achieve even greater heights!

🚧 Actually, there are ongoing works to integrate GaLore into bitsandbytes and optimize memory efficiency even further πŸ’ͺ. We'll keep you posted!
  • 1 reply
Β·