Bill Yuchen Lin's picture

Bill Yuchen Lin

yuchenlin

·

https://yuchenlin.xyz

AI & ML interests

Research @allenai LLMs and Multimodality, Agents

Recent Activity

liked a model 20 days ago

DongfuJiang/prm_gsm_2k_with_full_sol_mix_ref_remove_all_correct_hf

liked a Space about 1 month ago

akhaliq/anychat

updated a Space about 1 month ago

allenai/ZeroEval

View all activity

Articles

ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models

Organizations

yuchenlin's activity

commented 2 papers about 1 month ago

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

Paper • 2410.18693 • Published Oct 24 • 40 •

Stronger Models are NOT Stronger Teachers for Instruction Tuning

Paper • 2411.07133 • Published Nov 11 • 34 •

New activity in meta-llama/Llama-3.1-8B-Instruct 5 months ago

new tokenizer contains the cutoff date and today date by default

#74 opened 5 months ago by

commented a paper 5 months ago

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Paper • 2406.04770 • Published Jun 7 • 27 •

New activity in goodbadgreedy/GoodBadGreedy 5 months ago

Update README.md

#1 opened 5 months ago by

commented 2 papers 5 months ago

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

Paper • 2407.10457 • Published Jul 15 • 22 •

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

Paper • 2407.10457 • Published Jul 15 • 22 •

New activity in princeton-nlp/Llama-3-Base-8B-SFT-SimPO 5 months ago

no tokenizer?

#1 opened 5 months ago by

commented a paper 6 months ago

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Paper • 2406.18495 • Published Jun 26 • 12 •

New activity in allenai/WildBench 6 months ago

Is there any way for private model testing?

#9 opened 6 months ago by

Example IDs for GPT4o vs Claude3.5Sonnet

#8 opened 6 months ago by

Model to test, please

#7 opened 6 months ago by

commented 4 papers 6 months ago

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Paper • 2406.11069 • Published Jun 16 • 13 •

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Paper • 2406.11069 • Published Jun 16 • 13 •

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Paper • 2406.11069 • Published Jun 16 • 13 •

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12 • 65 •

New activity in allenai/WildBench 6 months ago

[Changelog] 2024-06-13 Update the WB-scores with gpt-4o version

#6 opened 6 months ago by

commented a paper 7 months ago

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12 • 65 •

New activity in allenai/BaseChat 7 months ago

Llama-3-8B thinks it is built by OpenAI

#1 opened 7 months ago by

Update README.md

#2 opened 7 months ago by