Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
BramVanroy 
posted an update Jun 1
Post
1693
The InstructGPT paper mentions that they insert 10% pretraining data during SFT, which they find improves the effect of PPO (IIUC). Has anyone else done later ablations on this? I've only seen the inverse suggested, mixing in SFT data during pretraining.

@lewtun or @lvwerra might know

I am not aware of any public ablations which validate this, but I suspect it has become less important for chat models where one is more interested in the performance via human evaluation instead of academic benchmarks like MMLU (which are OK for selecting base models, but less so for chat/instruct ones)