A.Genchev
AGenchev
·
AI & ML interests
None yet
Recent Activity
upvoted
a
collection
about 10 hours ago
olmOCR
reacted
to
burtenshaw's
post
with 👍
2 days ago
Here’s a notebook to make Gemma reason with GRPO & TRL. I made this whilst prepping the next unit of the reasoning course:
In this notebooks I combine together google’s model with some community tooling
- First, I load the model from the Hugging Face hub with transformers’s latest release for Gemma 3
- I use PEFT and bitsandbytes to get it running on Colab
- Then, I took Will Browns processing and reward functions to make reasoning chains from GSM8k
- Finally, I used TRL’s GRPOTrainer to train the model
Next step is to bring Unsloth AI in, then ship it in the reasoning course. Links to notebook below.
https://colab.research.google.com/drive/1Vkl69ytCS3bvOtV9_stRETMthlQXR4wX?usp=sharing
liked
a Space
2 days ago
Qwen/QwQ-32B-Demo
Organizations
None yet
models
None public yet
datasets
None public yet