[Help needed] Re-labelling models to separate different kinds of fine-tuning

#160
by clefourrier - opened
Open LLM Leaderboard org

@jaspercatapang suggested we should separate instruction-tuned from (vanilla) fine-tuned, and I agree!

If you want to give a hand, please open a PR and change the information in the TYPE_METADATA dict in this file, and I'll merge it asap!

Re-labelled the models in this PR here. It might need some reformatting.

For consistency, I followed a simple guide:

  1. If the model type is either pre-trained/RL, retain it.
  2. If the model card mentions that it follows instructions, then the new model type is instruction-tuned
  3. If the model card makes no reference to instruction-following, then the new model type is fine-tuned

If there are errors in my re-labelling, please open a PR to modify it. Thank you.

Open LLM Leaderboard org

That's amazing, thank you!
I'll leave your PR open for the week to in case the community wants to comment on it/adjust, and merge it on Friday!

clefourrier pinned discussion

My only suggestion is that maybe there should be a "dialog-tuned" category. Instruction tuning does not imply tuning for multi turn dialog or aka "chat". RLHF almost always means dialog tuned. I am not aware of anyone doing RLHF for something not a chat model. Essentially instruction tuning alone implies a single turn dialog; One instruction - one response. If model card says we made a chat model or we tuned for dialog that implies more than just instruction tuning.

clefourrier changed discussion status to closed
clefourrier unpinned discussion
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment