arxiv:2407.15762
Kaiwen Wang
kaiwenw
AI & ML interests
Reinforcement Learning
Recent Activity
updated
a dataset
16 days ago
kaiwenw/aft_after_jaft_test
updated
a dataset
about 2 months ago
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_75_chosen_25_reject
updated
a dataset
about 2 months ago
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_25_chosen_75_reject
Organizations
None yet
Papers
3
models
7
kaiwenw/nov11_oasst_aft_llama_lr_3e-5_rerun
Text Generation
•
Updated
•
8
kaiwenw/nov22_lr_3e-6_lora_32_dropout_0.1_all_reject_first_ep_4
Text Generation
•
Updated
•
2
kaiwenw/nov22_lr_3e-6_lora_32_dropout_0.1_all_reject_first_ep_3
Text Generation
•
Updated
•
2
kaiwenw/nov22_lr_3e-6_lora_32_dropout_0.1_all_reject_first_ep_2
Text Generation
•
Updated
•
2
kaiwenw/nov22_lr_3e-6_lora_32_dropout_0.1_all_reject_first_ep_1
Text Generation
•
Updated
•
2
kaiwenw/nov2_oasst_aft_llama_lr_3e-5
Text Generation
•
Updated
•
5
kaiwenw/oct31_oasst_llama70b_jft
Text Generation
•
Updated
•
61
datasets
82
kaiwenw/aft_after_jaft_test
Viewer
•
Updated
•
1.41k
•
38
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_75_chosen_25_reject
Viewer
•
Updated
•
14.1k
•
42
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_25_chosen_75_reject
Viewer
•
Updated
•
18.6k
•
40
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_50_chosen_50_reject
Viewer
•
Updated
•
37.9k
•
41
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_all_reject_first
Viewer
•
Updated
•
26.7k
•
41
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_all_chosen_first
Viewer
•
Updated
•
20.1k
•
43
kaiwenw/dec9_sp1_repeat_5_pref_jdpo
Viewer
•
Updated
•
44.5k
•
40
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_n_7_temp_0.9
Viewer
•
Updated
•
36.4k
•
43
kaiwenw/dec9_sp1_repeat_5
Viewer
•
Updated
•
18.2k
•
40
kaiwenw/dec9_sp1_pref_jdpo_75_chosen_25_reject
Viewer
•
Updated
•
2.39k
•
42