Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model Paper β’ 2404.04167 β’ Published Apr 5, 2024 β’ 13
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models Paper β’ 2502.13059 β’ Published 16 days ago
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper β’ 2502.14739 β’ Published 14 days ago β’ 94
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? Paper β’ 2502.19361 β’ Published 8 days ago β’ 24
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models Paper β’ 2502.16614 β’ Published 11 days ago β’ 23
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models Paper β’ 2502.16614 β’ Published 11 days ago β’ 23
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models Paper β’ 2502.16614 β’ Published 11 days ago β’ 23