jiminmun/llama-3.2-3b_ppo_lr5e-07_rm_avg_no_sys_msg_filtered Text Generation • 3B • Updated Feb 13 • 3
jiminmun/llama-3.2-3b_ppo_lr5e-07_rm_data-mix_no_sys_msg_filtered Text Generation • 3B • Updated Feb 13 • 4
jiminmun/llama-3.2-3b_reward_model_data_mix_lr9e-6_no_sys_msg_filtered Text Classification • 3B • Updated Feb 12 • 6
jiminmun/llama-3.2-3b_ppo_lr5e-07_rm_avg_w_sys_msg_unfiltered Text Classification • 3B • Updated Feb 10 • 5
jiminmun/llama-3.2-3b_ppo_lr5e-07_rm_data-mix_no_sys_msg_unfiltered Text Classification • 3B • Updated Feb 10 • 5
jiminmun/llama-3.2-3b_ppo_lr5e-07_rm_data-mix_w_sys_msg_unfiltered Text Classification • 3B • Updated Feb 10 • 4
jiminmun/llama-3.2-3b_reward_model_clarity_lr9e-6_no_sys_msg_filtered Text Classification • 3B • Updated Feb 9 • 4
jiminmun/llama-3.2-3b_reward_model_focus_lr9e-6_no_sys_msg_filtered Text Classification • 3B • Updated Feb 9 • 4
jiminmun/llama-3.2-3b_reward_model_relevance_lr9e-6_no_sys_msg_filtered Text Classification • 3B • Updated Feb 9 • 4
jiminmun/llama-3.2-3b_reward_model_avoidbias_lr9e-6_no_sys_msg_filtered Text Classification • 3B • Updated Feb 9 • 4