RLHFlow/ArmoRM-Llama3-8B-v0.1
Text Classification
•
Updated
•
19.1k
•
162
Reward models trained by RLHFlow codebase (https://github.com/RLHFlow/RLHF-Reward-Modeling/)
Note Bradley-Terry reward model trained with RLHFlow codebase
Note Tech report that covers Pairwise Preference Model
Note Tech report for ArmoRM