Great reward model, what dataset did you use to train?

#1
by zolicsaki - opened

Specifically I was wondering if you trained it on lmsys chatbot arena conversations, because your model is performing so well when evaluated on those preferences. Thanks for the help!

https://huggingface.co/datasets/lmsys/chatbot_arena_conversations

InternLM org

Sorry for the late reply. We did use a portion of this dataset. We performed data cleaning and filtering, including removing toxic and unsafe data, to ensure quality and safety.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment