Post
911
Here’s a notebook to make Gemma reason with GRPO & TRL. I made this whilst prepping the next unit of the reasoning course:
In this notebooks I combine together google’s model with some community tooling
- First, I load the model from the Hugging Face hub with transformers’s latest release for Gemma 3
- I use PEFT and bitsandbytes to get it running on Colab
- Then, I took Will Browns processing and reward functions to make reasoning chains from GSM8k
- Finally, I used TRL’s GRPOTrainer to train the model
Next step is to bring Unsloth AI in, then ship it in the reasoning course. Links to notebook below.
https://colab.research.google.com/drive/1Vkl69ytCS3bvOtV9_stRETMthlQXR4wX?usp=sharing
In this notebooks I combine together google’s model with some community tooling
- First, I load the model from the Hugging Face hub with transformers’s latest release for Gemma 3
- I use PEFT and bitsandbytes to get it running on Colab
- Then, I took Will Browns processing and reward functions to make reasoning chains from GSM8k
- Finally, I used TRL’s GRPOTrainer to train the model
Next step is to bring Unsloth AI in, then ship it in the reasoning course. Links to notebook below.
https://colab.research.google.com/drive/1Vkl69ytCS3bvOtV9_stRETMthlQXR4wX?usp=sharing