2 1 8

Kha Vu Chan

chankhavu

AI & ML interests

None yet

Recent Activity

upvoted an article 3 days ago

Open R1: Update #3

new activity 3 days ago

casperhansen/deepseek-r1-distill-qwen-7b-awq:Fail to reproduce quantization results

liked a model about 1 month ago

agentica-org/DeepScaleR-1.5B-Preview

View all activity

Organizations

None yet

chankhavu's activity

upvoted an article 3 days ago

Article

Open R1: Update #3

and 9 others •

3 days ago

• 214

New activity in casperhansen/deepseek-r1-distill-qwen-7b-awq 3 days ago

Fail to reproduce quantization results

#1 opened 3 days ago by

chankhavu

liked a model about 1 month ago

agentica-org/DeepScaleR-1.5B-Preview

Text Generation • Updated 20 days ago • 78.2k • • 520

New activity in open-r1/README about 1 month ago

[Experiment] Applying GRPO to DeepSeek-R1-Distill-Qwen-1.5B with LIMO

#15 opened about 1 month ago by

lewtun

liked a model 3 months ago

KirillR/QwQ-32B-Preview-AWQ

Text Generation • Updated Nov 27, 2024 • 1.85k • 24

liked a Space 3 months ago

535

Scaling test-time compute

📈

Enhance math problem solving by scaling test-time compute

updated a dataset 5 months ago

chankhavu/blogtruyenvn-archive

Preview • Updated Oct 8, 2024 • 45

liked a dataset 6 months ago

HuggingFaceFV/finevideo

Viewer • Updated Dec 16, 2024 • 39.5k • 9.1k • 301

reacted to Titus-von-Koeller's post with 🔥 12 months ago

Post

2006

🔥 Level up your model training w/ GaLore + Transformers for SOTA results on consumer-grade hardware!

⬇️ 82.5% less optimizer state memory footprint without performance degradation by expressing the gradient weight matrix as low rank.

👩🏿‍💻 Install via pip install transformers>=4.39.0 galore-torch. #ProudlyGpuPoor

The integration of GaLore into the training of large language models (LLMs) marks a significant advancement in the field of deep learning, particularly in terms of memory efficiency and the democratization of AI research. By allowing for the training of billion-parameter models on consumer-grade hardware, reducing memory footprint in optimizer states, and leveraging advanced projection matrix techniques, GaLore opens new horizons for researchers and practitioners with limited access to high-end computational resources.

🔬 Find out more about GaLore and investigate lots of juicy technical details: https://huggingface.co/blog/galore

🤗 Huge thanks to everyone involved ❤️:

• authors: @jiaweizhao @Kyriection @beidic Zhangyang Wang @animakumar @tydsh
• community contributors: @hiyouga @mdouglas and others!
• @ybelkada for taking such swift action in composing and coordinating necessary PRs to get this live at ⚡ speed!

🏗️📈 Super rewarding to see how @timdettmers work with optimizers is being built upon to achieve even greater heights!

🚧 Actually, there are ongoing works to integrate GaLore into bitsandbytes and optimize memory efficiency even further 💪. We'll keep you posted!