ACECODER: Acing Coder RL via Automated Test-Case Synthesis Paper • 2502.01718 • Published 5 days ago • 22
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning Paper • 2502.01100 • Published 5 days ago • 14
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning Paper • 2305.15065 • Published May 24, 2023 • 1
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation Paper • 2211.00295 • Published Nov 1, 2022
The Art of Saying No: Contextual Noncompliance in Language Models Paper • 2407.12043 • Published Jul 2, 2024 • 4
WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries Paper • 2407.17468 • Published Jul 24, 2024
Question Answering for Privacy Policies: Combining Computational and Legal Perspectives Paper • 1911.00841 • Published Nov 3, 2019
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them Paper • 2501.08292 • Published 25 days ago • 17
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 59
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 59
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 59
SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation Paper • 2410.16665 • Published Oct 22, 2024
On Memorization of Large Language Models in Logical Reasoning Paper • 2410.23123 • Published Oct 30, 2024 • 18
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback Paper • 2410.19133 • Published Oct 24, 2024 • 11
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback Paper • 2410.19133 • Published Oct 24, 2024 • 11