@qq8933 on Hugging Face: "LLaMA-O1-PRM and LLaMA-O1-Reinforcement will release in this weekend. We have…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

qq8933

posted an update 14 days ago

Post

2517

LLaMA-O1-PRM and LLaMA-O1-Reinforcement will release in this weekend.
We have implemented a novel Reinforcement finetune(RFT) pipeline that taught models learning reasoning and reward labeling without human annotation.

AlexLINB

14 days ago

Looking forward to it

qq8933

14 days ago

not perfect, but just works:)

Teera

13 days ago

In this post

qq8933 Di Zhang
AlexLINB AlexLI
Teera Narak A'