752 60 279

Younes Belkada

ybelkada

AI & ML interests

Large Language Models, Quantization, Vision, Multimodality, Diffusion models

Articles

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

May 24, 2023

• 100

Introducing RWKV — An RNN with the advantages of a transformer

May 15, 2023

• 14

StackLLaMA: A hands-on guide to train LLaMA with RLHF

Apr 5, 2023

• 21

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

Mar 9, 2023

• 34

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Aug 17, 2022

• 66

Organizations

ybelkada's activity

posted an update 4 months ago

Post

2870

Falcon Mamba now available now in llama.cpp !
Check out GGUF files uploaded here: tiiuae/falconmamba-7b-66b9a580324dd1598b0f6d4a

2 replies

posted an update 4 months ago

Post

3678

FalconMamba 7B - a new model from TII (Technology Innovation Institute) is out !

- Blogpost: https://huggingface.co/blog/falconmamba
- Link to collection: tiiuae/falconmamba-7b-66b9a580324dd1598b0f6d4a
- Link to playground: tiiuae/falcon-mamba-playground

reacted to dvilasuero's post with 🤗❤️🚀🔥 6 months ago

Post

8071

Today is a huge day in Argilla’s history. We couldn’t be more excited to share this with the community: we’re joining Hugging Face!

We’re embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.

Over the past year, we’ve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyr’s learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets

After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, we’re now the same team.

To those of you who’ve been following us, this won’t be a huge surprise, but it will be a big deal in the coming months. This acquisition means we’ll double down on empowering the community to build and collaborate on high quality datasets, we’ll bring full support for multimodal datasets, and we’ll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.

As a founder, I am proud of the Argilla team. We're now part of something bigger and a larger team but with the same values, culture, and goals. Grateful to have shared this journey with my beloved co-founders Paco and Amélie.

Finally, huge thanks to the Chief Llama Officer @osanseviero for sparking this and being such a great partner during the acquisition process.

Would love to answer any questions you have so feel free to add them below!

28 replies

reacted to mayank-mishra's post with ❤️🚀 9 months ago

Post

2560

Thrilled to unveil DS-MoE: a dense training and sparse inference scheme for enhanced computational and memory efficiency in your MoE models! 🚀🚀🚀

Discover more in our blog: https://huggingface.co/blog/bpan/ds-moe and dive into the details with our paper: Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models (2404.05567)

1 reply

reacted to Titus-von-Koeller's post with 🔥🤝 9 months ago

Post

1962

🔥 Level up your model training w/ GaLore + Transformers for SOTA results on consumer-grade hardware!

⬇️ 82.5% less optimizer state memory footprint without performance degradation by expressing the gradient weight matrix as low rank.

👩🏿‍💻 Install via pip install transformers>=4.39.0 galore-torch. #ProudlyGpuPoor

The integration of GaLore into the training of large language models (LLMs) marks a significant advancement in the field of deep learning, particularly in terms of memory efficiency and the democratization of AI research. By allowing for the training of billion-parameter models on consumer-grade hardware, reducing memory footprint in optimizer states, and leveraging advanced projection matrix techniques, GaLore opens new horizons for researchers and practitioners with limited access to high-end computational resources.

🔬 Find out more about GaLore and investigate lots of juicy technical details: https://huggingface.co/blog/galore

🤗 Huge thanks to everyone involved ❤️:

• authors: @jiaweizhao @Kyriection @beidic Zhangyang Wang @animakumar @tydsh
• community contributors: @hiyouga @mdouglas and others!
• @ybelkada for taking such swift action in composing and coordinating necessary PRs to get this live at ⚡ speed!

🏗️📈 Super rewarding to see how @timdettmers work with optimizers is being built upon to achieve even greater heights!

🚧 Actually, there are ongoing works to integrate GaLore into bitsandbytes and optimize memory efficiency even further 💪. We'll keep you posted!

1 reply

reacted to Titus-von-Koeller's post with ❤️🤗 10 months ago

Post

We just released bitsandbytes==0.43.0 📦 , with these significant new additions:

‣ 🛫 FSDP+QLoRA support (alpha release)
◦ now anyone with 2 powerful gaming GPUs can fine-tune 70B param models at home!
◦ in collab with Jeremy Howard + team @ answer.ai
◦ answer.ai blogpost: https://www.answer.ai/posts/2024-03-06-fsdp-qlora.html
◦ example repo: https://github.com/AnswerDotAI/fsdp_qlora/

‣ 🌈⊞ Official Windows support
◦ now via simple pip install bitsandbytes>=0.43.0

‣ 📄 Huge docs update:
◦ https://huggingface.co/docs/bitsandbytes/main
◦ Be sure to check out the optimizers and the API docs
◦ ... even more upcoming ...

Under the hood there we have many other improvements, due to extensive maintenance activity, community contributions by super active + knowledgable volunteers ✨ 🚀 and the official sponsorship by Hugging Face that makes all this possible 🤗 ❤️ 🌍

We would greatly appreciate any further community contributions, be it to help with refactorings, exterminating flaky tests, writing doc-strings, tutorials, new features. Don't be shy, just contact us and we see where this leads us:
https://github.com/TimDettmers/bitsandbytes/discussions

Have a great weekend everyone!

1 reply

posted an update 10 months ago

Post

Check out quantized weights from ISTA-DAS Lab directly in their organisation page: https://huggingface.co/ISTA-DASLab ! With official weights of AQLM (for 2bit quantization) & QMoE (1-bit MoE quantization)

Read more about these techniques below:

AQLM paper: Extreme Compression of Large Language Models via Additive Quantization (2401.06118)
QMoE: QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models (2310.16795)

Some useful links below:

AQLM repo: https://github.com/Vahe1994/AQLM
How to use AQLM & transformers: https://huggingface.co/docs/transformers/quantization#aqlm
How to use AQLM & PEFT: https://huggingface.co/docs/peft/developer_guides/quantization#aqlm-quantizaion

Great work from @BlackSamorez and team !

reacted to andrewyng's post with 🤯❤️👍 10 months ago

Post

DeepLearning.AI just announced a new short course: Open Source Models with Hugging Face 🤗, taught by Hugging Face's own Maria Khalusova, Marc Sun and Younes Belkada!

As many of you already know, Hugging Face has been a game changer by letting developers quickly grab any of hundreds of thousands of already-trained open source models to assemble into new applications. This course teaches you best practices for building this way, including how to search and choose among models.

You'll learn to use the Transformers library and walk through multiple models for text, audio, and image processing, including zero-shot image segmentation, zero-shot audio classification, and speech recognition. You'll also learn to use multimodal models for visual question answering, image search, and image captioning. Finally, you’ll learn how to demo what you build locally, on the cloud, or via an API using Gradio and Hugging Face Spaces.

Thank you very much to Hugging Face's wonderful team for working with us on this.

You can sign up for the course here: https://www.deeplearning.ai/short-courses/open-source-models-hugging-face/

1 reply

replied to bstadt's post 10 months ago

Exciting release !

reacted to bstadt's post with ❤️👍 10 months ago

Post

You can now use gpt4all.io to instantly search, download, and chat with models hosted on huggingface!

2 replies

reacted to smangrul's post with 👍 10 months ago

Post

🚨 New Release of 🤗PEFT!

1. New methods for merging LoRA weights. Refer this HF Post for more details: https://huggingface.co/posts/smangrul/850816632583824

2. AWQ and AQLM support for LoRA. You can now:
- Train adapters on top of 2-bit quantized models with AQLM
- Train adapters on top of powerful AWQ quantized models
Note for inference you can't merge the LoRA weights into the base model!

3. DoRA support: Enabling DoRA is as easy as adding use_dora=True to your LoraConfig. Find out more about this method here: https://arxiv.org/abs/2402.09353

4. Improved documentation, particularly docs regarding PEFT LoRA+DeepSpeed and PEFT LoRA+FSDP! 📄 Check out the docs at https://huggingface.co/docs/peft/index.

5. Full Release Notes: https://github.com/huggingface/peft/releases/tag/v0.9.0

5 replies

Younes Belkada

AI & ML interests

Articles

Welcome FalconMamba: The first strong attention-free 7B model

Welcome Llama 3 - Meta's new open LLM

GaLore: Advancing Large Model Training on Consumer-grade Hardware

quanto: a pytorch quantization toolkit

Fine-Tuning Gemma Models in Hugging Face

Mixture of Experts Explained

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

Overview of natively supported quantization schemes in 🤗 Transformers

Making LLMs lighter with AutoGPTQ and transformers

Fine-tune Llama 2 with DPO

The Falcon has landed in the Hugging Face ecosystem

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

Introducing RWKV — An RNN with the advantages of a transformer

StackLLaMA: A hands-on guide to train LLaMA with RLHF

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Organizations

ybelkada's activity