Open Source AI Research Community

community

AI & ML interests

None defined yet.

Recent Activity

OSAIResearchCommunity's activity

burtenshawĀ 
posted an update about 18 hours ago
view post
Post
561
The open LLM leaderboard is completed, retired, dead, ā€˜ascended to a higher planeā€™. And in its shadow we have an amazing range of leaderboards built and maintained by the community.

In this post, I just want to list some of those great leaderboards that you should bookmark for staying up to date:

- Chatbot Arena LLM Leaderboard is the first port of call for checking out the best model. Itā€™s not the fastest because humans will need to use the models to get scores, but itā€™s worth the wait. lmarena-ai/chatbot-arena-leaderboard

- OpenVLM Leaderboard is great for getting scores on vision language models opencompass/open_vlm_leaderboard

- Ai2 are doing a great job on RewardBench and I hope they keep it up because reward models are the unsexy workhorse of the field. allenai/reward-bench

- The GAIA leaderboard is great for evaluating agent applications. gaia-benchmark/leaderboard

šŸ¤© This seems like such a sustainable way of building for the long term, where rather than leaning on a single company to evaluate all LLMs, we share the load.
  • 1 reply
Ā·
burtenshawĀ 
posted an update 1 day ago
view post
Post
1028
Still speed running Gemma 3 to think. Today I focused on setting up gpu poor hardware to run GRPO.

This is a plain TRL and PEFT notebook which works on mac silicone or colab T4. This uses the 1b variant of Gemma 3 and a reasoning version of GSM8K dataset.

šŸ§‘ā€šŸ³ Thereā€™s more still in the oven like releasing models, an Unsloth version, and deeper tutorials, but hopefully this should bootstrap your projects.

Hereā€™s a link to the 1b notebook: https://colab.research.google.com/drive/1mwCy5GQb9xJFSuwt2L_We3eKkVbx2qSt?usp=sharing
burtenshawĀ 
posted an update 1 day ago
view post
Post
1348
everybody and their dog is fine-tuning Gemma 3 today, so I thought I'd do a longer post on the tips and sharp edges I find. let's go!

1. has to be install everything form main and nightly. this is what I'm working with to get unsloth and TRL running

git+https://github.com/huggingface/transformers@main
git+https://github.com/huggingface/trl.git@main
bitsandbytes
peft


plus this with --no-deps

git+https://github.com/unslothai/unsloth-zoo.git@nightly
git+https://github.com/unslothai/unsloth.git@nightly


2. will brown's code to turn GSM8k into a reasoning dataset is a nice toy experiment https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb

3. with a learning rate of 5e-6 rewards and loss stayed flat for the first 100 or so steps.

4. so far none of my runs have undermined the outputs after 1 epoch. therefore, I'm mainly experimenting with bigger LoRA adapters.

from trl import GRPOConfig

training_args = GRPOConfig(
    learning_rate = 5e-6,
    adam_beta1 = 0.9,
    adam_beta2 = 0.99,
    weight_decay = 0.1,
    warmup_ratio = 0.1,
    lr_scheduler_type = "cosine",
    optim = "adamw_8bit",
    logging_steps = 1,
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 1,
    num_generations = 2,
    max_prompt_length = 256,
    max_completion_length = 1024 - 256,
    num_train_epochs = 1,
    max_steps = 250,
    save_steps = 250,
    max_grad_norm = 0.1,
    report_to = "none",
)


5. vision fine-tuning isn't available in TRL's GRPOTrainer, so stick to text datasets. but no need to load the model differently in transformers or Unsloth

from transformers import AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-it)


if you want an introduction to GRPO, check out the reasoning course, it walks you through the algorithm, theory, and implementation in a smooth way.

https://huggingface.co/reasoning-course
  • 2 replies
Ā·
burtenshawĀ 
posted an update 2 days ago
view post
Post
1633
Hereā€™s a notebook to make Gemma reason with GRPO & TRL. I made this whilst prepping the next unit of the reasoning course:

In this notebooks I combine together googleā€™s model with some community tooling

- First, I load the model from the Hugging Face hub with transformersā€™s latest release for Gemma 3
- I use PEFT and bitsandbytes to get it running on Colab
- Then, I took Will Browns processing and reward functions to make reasoning chains from GSM8k
- Finally, I used TRLā€™s GRPOTrainer to train the model

Next step is to bring Unsloth AI in, then ship it in the reasoning course. Links to notebook below.

https://colab.research.google.com/drive/1Vkl69ytCS3bvOtV9_stRETMthlQXR4wX?usp=sharing
Ā·
burtenshawĀ 
posted an update 9 days ago
view post
Post
3561
Iā€™m super excited to work with @mlabonne to build the first practical example in the reasoning course.

šŸ”— https://huggingface.co/reasoning-course

Here's a quick walk through of the first drop of material that works toward the use case:

- a fundamental introduction to reinforcement learning. Answering questions like, ā€˜what is a reward?ā€™ and ā€˜how do we create an environment for a language model?ā€™

- Then it focuses on Deepseek R1 by walking through the paper and highlighting key aspects. This is an old school way to learn ML topics, but it always works.

- Next, it takes to you Transformers Reinforcement Learning and demonstrates potential reward functions you could use. This is cool because it uses Marimo notebooks to visualise the reward.

- Finally, Maxime walks us through a real training notebook that uses GRPO to reduce generation length. Iā€™m really into this because it works and Maxime took the time to validate it share assets and logging from his own runs for you to compare with.

Maximeā€™s work and notebooks have been a major part of the open source community over the last few years. I, like everyone, have learnt so much from them.
burtenshawĀ 
posted an update 16 days ago
view post
Post
5411
I made a real time voice agent with FastRTC, smolagents, and hugging face inference providers. Check it out in this space:

šŸ”— burtenshaw/coworking_agent
Ā·
burtenshawĀ 
posted an update 17 days ago
view post
Post
6211
Now the Hugging Face agent course is getting real! With frameworks like smolagents, LlamaIndex, and LangChain.

šŸ”— Follow the org for updates https://huggingface.co/agents-course

This week we are releasing the first framework unit in the course and itā€™s on smolagents. This is what the unit covers:

- why should you use smolagents vs another library?
- how to build agents that use code
- build multiagents systems
- use vision language models for browser use

The team has been working flat out on this for a few weeks. Led by @sergiopaniego and supported by smolagents author @m-ric .
burtenshawĀ 
posted an update 24 days ago
view post
Post
7312
AGENTS + FINETUNING! This week Hugging Face learn has a whole pathway on finetuning for agentic applications. You can follow these two courses to get knowledge on levelling up your agent game beyond prompts:

1ļøāƒ£ New Supervised Fine-tuning unit in the NLP Course https://huggingface.co/learn/nlp-course/en/chapter11/1
2ļøāƒ£New Finetuning for agents bonus module in the Agents Course https://huggingface.co/learn/agents-course/bonus-unit1/introduction

Fine-tuning will squeeze everything out of your model for how youā€™re using it, more than any prompt.
  • 2 replies
Ā·
burtenshawĀ 
posted an update 26 days ago
view post
Post
3524
NEW COURSE! Weā€™re cooking hard on Hugging Face courses, and itā€™s not just agents. The NLP course is getting the same treatment with a new chapter on Supervised Fine-Tuning!

šŸ‘‰ Follow to get more updates https://huggingface.co/nlp-course

The new SFT chapter will guide you through these topics:

1ļøāƒ£ Chat Templates: Master the art of structuring AI conversations for consistent and helpful responses.

2ļøāƒ£ Supervised Fine-Tuning (SFT): Learn the core techniques to adapt pre-trained models to your specific outputs.

3ļøāƒ£ Low Rank Adaptation (LoRA): Discover efficient fine-tuning methods that save memory and resources.

4ļøāƒ£ Evaluation: Measure your model's performance and ensure top-notch results.

This is the first update in a series, so follow along if youā€™re upskilling in AI.
  • 2 replies
Ā·
burtenshawĀ 
posted an update 29 days ago
view post
Post
3599
Hey, Iā€™m Ben and I work at Hugging Face.

Right now, Iā€™m focusing on educational stuff and getting loads of new people to build open AI models using free and open source tools.

Iā€™ve made a collection of some of the tools Iā€™m building and using for teaching. Stuff like quizzes, code challenges, and certificates.

burtenshaw/tools-for-learning-ai-6797453caae193052d3638e2
  • 1 reply
Ā·
burtenshawĀ 
posted an update about 1 month ago
view post
Post
9148
The Hugging Face agents course is finally out!

šŸ‘‰ https://huggingface.co/agents-course

This first unit of the course sets you up with all the fundamentals to become a pro in agents.

- What's an AI Agent?
- What are LLMs?
- Messages and Special Tokens
- Understanding AI Agents through the Thought-Action-Observation Cycle
- Thought, Internal Reasoning and the Re-Act Approach
- Actions, Enabling the Agent to Engage with Its Environment
- Observe, Integrating Feedback to Reflect and Adapt
burtenshawĀ 
posted an update about 1 month ago
view post
Post
3705
SmolLM2 paper is out! šŸ˜Š

šŸ˜ Why do I love it? Because it facilitates teaching and learning!

Over the past few months I've engaged with (no joke) thousands of students based on SmolLM.

- People have inferred, fine-tuned, aligned, and evaluated this smol model.
- People used they're own machines and they've used free tools like colab, kaggle, and spaces.
- People tackled use cases in their job, for fun, in their own language, and with their friends.

upvote the paper SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model (2502.02737)
  • 1 reply
Ā·
burtenshawĀ 
posted an update about 2 months ago
view post
Post
3340
Manic few days in open source AI, with game changing development all over the place. Here's a round up of the resources:

- The science team at @huggingface reproduced and open source the seek r1. https://github.com/huggingface/open-r1
- @qwen released a series of models with 1 million token context! https://qwenlm.github.io/blog/qwen2.5-1m/
- SmolVLM got even smaller with completely new variants at 256m and 500m https://huggingface.co/blog/smolervlm

There's so much you could do with these developments. Especially combining them together into agentic applications or fine-tuning them on your use case.
  • 1 reply
Ā·
burtenshawĀ 
posted an update about 2 months ago
view post
Post
1437
Hey šŸ‘‹

I'm helping out on some community research to learn about the AI community. If you want to join in the conversation, head over here where I started a community discussion on the most influential model since BERT.

OSAIResearchCommunity/README#2
burtenshawĀ 
posted an update about 2 months ago
view post
Post
2086
šŸ“£ Teachers and Students! Here's a handy quiz app if you're preparing your own study material.

TLDR, It's a quiz that uses a dataset to make questions and save answers

Here's how it works:

- make a dataset of multiple choice questions
- duplicate the space add set the dataset repo
- log in and do the quiz
- submit the questions to create a new dataset

I made this to get ready for the agents course, but I hope it's useful for you projects too!

quiz app burtenshaw/dataset_quiz

dataset with questions burtenshaw/exam_questions

agents course we're working on https://huggingface.co/agents-course
burtenshawĀ 
posted an update about 2 months ago
view post
Post
2666
AI was built on side projects!
burtenshawĀ 
posted an update about 2 months ago
view post
Post
4032
šŸš§ Work in Progress! šŸš§

šŸ‘·ā€ā™€ļø We're working hard on getting the official agents course ready for the 50,000 students that have signed up.

If you want to contribute to the discussion, I started these community posts. Looking forward to hearing from you:

- smolagents unit in the agents course - agents-course/README#7
- LlamaIndex Unit in the agents course - agents-course/README#6
- LangChain and LangGraph unit in the agents course - agents-course/README#5
- Real world use cases in the agents course - agents-course/README#8


burtenshawĀ 
posted an update about 2 months ago