Transformers
PyTorch
English
llama
reward model
RLHF
RLAIF
text-generation-inference
Starling-RM-7B-alpha / trainer_state.json

Commit History

Duplicate from banghua/n_rm
6f8f5dc

Banghua Zhu commited on