view article Article Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs Apr 16, 2024 • 15
LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers Paper • 2310.15164 • Published Oct 23, 2023 • 1
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code Paper • 2403.07974 • Published Mar 12, 2024 • 2
StarCoder 2 and The Stack v2: The Next Generation Paper • 2402.19173 • Published Feb 29, 2024 • 138
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution Paper • 2401.03065 • Published Jan 5, 2024 • 11