Post
1900
I've been working on something cool: a GRPO with an LLM evaluator that can also perform SFT on the feedback data - if you want. Check it out 😊
Any 🌟are more than welcome 🤗
https://github.com/mkurman/grpo-llm-evaluator
Any 🌟are more than welcome 🤗
https://github.com/mkurman/grpo-llm-evaluator