Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction Paper • 2508.03613 • Published 12 days ago • 11
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors Paper • 2502.11167 • Published Feb 16 • 10
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving Paper • 2502.07640 • Published Feb 11 • 9
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation Paper • 2411.00412 • Published Nov 1, 2024 • 10
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14, 2024 • 39