yushun zhang's picture
2 4

yushun zhang

yushun0410

AI & ML interests

LLMs

Recent Activity

upvoted a paper 3 months ago
Qwen2.5 Technical Report
upvoted a collection 4 months ago
Qwen2.5
upvoted a collection 4 months ago
Qwen2.5-Math
View all activity

Organizations

None yet

yushun0410's activity

reacted to their post with 🔥🚀👍 9 months ago
view post
Post
4632
Hi Huggingfacers!

Thrilled to introduce Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini can also achieve 49.5% higher throughput than AdamW on Llama2-7B pre-training.

The design of Adam-mini is inspired by certain Hessian structures we observed on Transformers.

Feel free to try it out! Try switching to Adam-mini with the same hyperparams of AdamW, it would work with only half memory. Hope Adam-mini can help save time, cost, and energy in your tasks!

Paper: "Adam-mini: Use Fewer Learning Rates To Gain More" https://arxiv.org/abs/2406.16793

Code: https://github.com/zyushun/Adam-mini

  • 1 reply
·
posted an update 9 months ago
view post
Post
4632
Hi Huggingfacers!

Thrilled to introduce Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini can also achieve 49.5% higher throughput than AdamW on Llama2-7B pre-training.

The design of Adam-mini is inspired by certain Hessian structures we observed on Transformers.

Feel free to try it out! Try switching to Adam-mini with the same hyperparams of AdamW, it would work with only half memory. Hope Adam-mini can help save time, cost, and energy in your tasks!

Paper: "Adam-mini: Use Fewer Learning Rates To Gain More" https://arxiv.org/abs/2406.16793

Code: https://github.com/zyushun/Adam-mini

  • 1 reply
·
New activity in huggingface/HuggingDiscussions 9 months ago

[FEEDBACK] Daily Papers

127
#32 opened 9 months ago by
kramp