More details on training data for reward model
#2
by
reign12
- opened
Many thanks for your great effort of open-sourcing this reward model! However, I am very curious about the details of the training data of this reward model.
What is the oasst_export exactly?
What does the fraction
mean in the Datasets part?
And how can we use hellaswag as a comparison dataset?
Many thanks for any discussions in advance!