Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
Process Reward Model trained by OpenRLHF
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
|
|
1 |
Process Reward Model trained by OpenRLHF
|
2 |
+
|
3 |
+
- Dataset: Math-Shepherd (https://huggingface.co/datasets/peiyi9979/Math-Shepherd)
|
4 |
+
- Learning Rate: 1e-6
|
5 |
+
- Training Accuracy: 0.922
|