3 15 1

zhu

xuekai

AI & ML interests

None yet

Recent Activity

upvoted an article 1 day ago

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

upvoted a paper 10 days ago

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

upvoted an article 25 days ago

Putting RL back in RLHF

View all activity

Organizations

xuekai's activity

upvoted an article 1 day ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

3 days ago

• 10

upvoted a paper 10 days ago

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Paper • 2501.18362 • Published 11 days ago • 19

upvoted an article 25 days ago

Article

Putting RL back in RLHF

Jun 12, 2024

• 75

upvoted an article about 1 month ago

Article

Process Reinforcement through Implicit Rewards

and 1 other •

Jan 3

• 23

upvoted a paper about 1 month ago

Free Process Rewards without Process Labels

Paper • 2412.01981 • Published Dec 2, 2024 • 32

upvoted an article about 1 month ago

Article

Understanding InstaFlow/Rectified Flow

•

Oct 6, 2023

• 25

upvoted a paper about 2 months ago

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Paper • 2412.17739 • Published Dec 23, 2024 • 40

commented a paper about 2 months ago

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 49 •

authored a paper about 2 months ago

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 49

upvoted a paper about 2 months ago

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 49

upvoted 2 papers 2 months ago

On Domain-Specific Post-Training for Multimodal Large Language Models

Paper • 2411.19930 • Published Nov 29, 2024 • 26

Yi-Lightning Technical Report

Paper • 2412.01253 • Published Dec 2, 2024 • 27

upvoted a paper 5 months ago

VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

Paper • 2409.01071 • Published Sep 2, 2024 • 27

upvoted a collection 6 months ago

Awesome SFT datasets

Collection

A curated list of interesting datasets to fine-tune language models with. • 43 items • Updated Apr 12, 2024 • 128

New activity in bigcode/the-stack-v2-dedup 6 months ago

Is there any sampled version of bigcode/the-stack-v2-dedup ?

#9 opened 6 months ago by

xuekai

upvoted a paper 6 months ago

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20, 2024 • 88

liked a dataset 8 months ago

instruction-pretrain/ft-instruction-synthesizer-collection

Viewer • Updated Dec 2, 2024 • 249k • 1.32k • 60

upvoted a paper 8 months ago

VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Paper • 2406.16338 • Published Jun 24, 2024 • 26

updated a dataset 11 months ago

xuekai/pad_train

Viewer • Updated Mar 21, 2024 • 184k • 12

New activity in allenai/dolma 11 months ago

JSON ERROR in loading files of v1_6-sample using load_dataset

#22 opened 12 months ago by

sakurapeng