Qwen2.5-3B-GRPO
Collection
Trained with unsloth on just 250 steps (resource constraints) on GSM8K to add reasoning abilities to Qwen2.5-3B (smaller model because resources)
•
3 items
•
Updated
This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.