Process Reward Model trained by OpenRLHF
```
dataset Math-Shepherd
Training accuracy 0.922
```