Zhihan Liu's picture

2

Zhihan Liu

ZHLiu627

AI & ML interests

LLMs

Recent Activity

upvoted a paper 3 days ago

Self-rewarding correction for mathematical reasoning

updated a dataset 3 days ago

ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v1

updated a dataset 4 days ago

ZHLiu627/dataset_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212_2_global_step_70filtered_v1_v1

View all activity

Organizations

None yet

Collections 1

models 9

ZHLiu627/zephyr-7b-gemma-dpo-avg

Updated 6 days ago • 14

ZHLiu627/zephyr-7b-gemma-rpo-avg

Updated 6 days ago • 31

ZHLiu627/beta_ultra_dpo_full_beta0.01_new

Updated 6 days ago • 2

ZHLiu627/beta_ultra_rdpo_full_eta0.005_beta0.01_no_decay_new

Updated 6 days ago • 5

ZHLiu627/debug

Updated 18 days ago • 3

ZHLiu627/zephyr-7b-gemma-dpo

Updated Aug 1, 2024 • 4

ZHLiu627/zephyr-gemma-rpo

Text Generation • Updated Aug 1, 2024 • 4

ZHLiu627/zephyr-7b-dpo-full

Text Generation • Updated Mar 9, 2024 • 12

ZHLiu627/zephyr-7b-rdpo-full-eta0.005-beta0.1

Updated Mar 8, 2024

datasets 16

ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v1

Viewer • Updated 3 days ago • 29.3k • 33

ZHLiu627/dataset_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212_2_global_step_70filtered_v1_v1

Viewer • Updated 4 days ago • 29.3k • 6

ZHLiu627/dataset_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212_2_global_step_70filtered_v1

Viewer • Updated 9 days ago • 29.3k • 161

ZHLiu627/updated-code-qwen7-edufiltered

Viewer • Updated 9 days ago • 43k • 23

ZHLiu627/updated-code-qwen7-edu

Viewer • Updated 9 days ago • 75.6k • 21

ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v2filtered

Viewer • Updated 11 days ago • 28.9k • 24

ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v2

Viewer • Updated 11 days ago • 29.3k • 21

ZHLiu627/dataset_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212_2_global_step_70filteredd

Viewer • Updated 11 days ago • 29.3k • 21

ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v1filtered

Viewer • Updated 11 days ago • 29.1k • 22

ZHLiu627/qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v2

Viewer • Updated 13 days ago • 29.3k • 28