SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published 4 days ago • 91
FoNE: Precise Single-Token Number Embeddings via Fourier Features Paper • 2502.09741 • Published 11 days ago • 11
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning Paper • 2501.12570 • Published Jan 22 • 24
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published Jan 16 • 37
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs Paper • 2412.21187 • Published Dec 30, 2024 • 39
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published Dec 16, 2024 • 41
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 92
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 94
ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? Paper • 2411.06469 • Published Nov 10, 2024 • 17
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published Nov 14, 2024 • 72
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery Paper • 2410.05080 • Published Oct 7, 2024 • 21