@BramVanroy on Hugging Face: "The InstructGPT paper mentions that they insert 10% pretraining data during…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

BramVanroy

posted an update Jun 1

Post

1693

The InstructGPT paper mentions that they insert 10% pretraining data during SFT, which they find improves the effect of PPO (IIUC). Has anyone else done later ablations on this? I've only seen the inverse suggested, mixing in SFT data during pretraining.

osanseviero

Jun 3

@lewtun or @lvwerra might know

lewtun

Jun 3

I am not aware of any public ablations which validate this, but I suspect it has become less important for chat models where one is more interested in the performance via human evaluation instead of academic benchmarks like MMLU (which are OK for selecting base models, but less so for chat/instruct ones)

In this post

BramVanroy Bram Vanroy
osanseviero Omar Sanseviero
lewtun Lewis Tunstall