@burtenshaw on Hugging Face: "Here’s a notebook to make Gemma reason with GRPO & TRL. I made this whilst…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

burtenshaw

posted an update 1 day ago

Post

911

Here’s a notebook to make Gemma reason with GRPO & TRL. I made this whilst prepping the next unit of the reasoning course:

In this notebooks I combine together google’s model with some community tooling

- First, I load the model from the Hugging Face hub with transformers’s latest release for Gemma 3
- I use PEFT and bitsandbytes to get it running on Colab
- Then, I took Will Browns processing and reward functions to make reasoning chains from GSM8k
- Finally, I used TRL’s GRPOTrainer to train the model

Next step is to bring Unsloth AI in, then ship it in the reasoning course. Links to notebook below.

https://colab.research.google.com/drive/1Vkl69ytCS3bvOtV9_stRETMthlQXR4wX?usp=sharing

AtAndDev

1 day ago

Bruh its been 8 hours since announcement. chill ya guys

Akhil-Theerthala

1 day ago

Thanks. I was needing it.

In this post

burtenshaw ben burtenshaw
AtAndDev alkinun
Akhil-Theerthala Akhil Theerthala