arxiv:2412.05718
Harshit Sikchi
hsikchi
AI & ML interests
None yet
Recent Activity
commented
a paper
about 1 month ago
RL Zero: Zero-Shot Language to Behaviors without any Supervision
upvoted
a
paper
7 months ago
Scaling Laws for Reward Model Overoptimization in Direct Alignment
Algorithms
authored
a paper
7 months ago
Scaling Laws for Reward Model Overoptimization in Direct Alignment
Algorithms
Organizations
models
36
hsikchi/pythia-6.9b-goldrm_tldr-dpo-beta-0.0175-alpha-0-step-79872
Text Generation
•
Updated
•
23
hsikchi/pythia-6.9b-goldrm_tldr-dpo-beta-0.0175-alpha-0-step-59904
Text Generation
•
Updated
•
21
hsikchi/pythia-6.9b-goldrm_tldr-dpo-beta-0.0175-alpha-0-step-19968
Text Generation
•
Updated
•
23
hsikchi/pythia-6.9b-goldrm_tldr-dpo-beta-0.0375-alpha-0-step-59904
Text Generation
•
Updated
•
26
hsikchi/pythia-6.9b-goldrm_tldr-dpo-beta-0.0375-alpha-0-step-39936
Text Generation
•
Updated
•
23
hsikchi/pythia-6.9b-goldrm_tldr-dpo-beta-0.0375-alpha-0-step-79872
Text Generation
•
Updated
•
21
hsikchi/pythia-6.9b-goldrm_tldr-dpo-beta-0.0175-alpha-0-step-39936
Text Generation
•
Updated
•
24
hsikchi/pythia-6.9b-goldrm_tldr-dpo-beta-0.0175-alpha-0-LATEST
Text Generation
•
Updated
•
22
hsikchi/pythia-6.9b-goldrm_tldr-dpo-beta-0.0375-alpha-0-step-19968
Text Generation
•
Updated
•
23
hsikchi/pythia-6.9b-goldrm_tldr-dpo-beta-0.025-alpha-0-step-59904
Text Generation
•
Updated
•
27