view article Article TextQuests: How Good are LLMs at Text-Based Video Games? By justinphan3110 and 1 other • 6 days ago • 20
view article Article 🇵🇭 FilBench - Can LLMs Understand and Generate Filipino? By ljvmiranda921 and 8 others • 6 days ago • 8
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! By reach-vb and 11 others • 13 days ago • 459
view article Article Back to The Future: Evaluating AI Agents on Predicting Future Events By vinid and 6 others • Jul 17 • 36
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • Jul 8 • 627
view article Article Fixing Open LLM Leaderboard with Math-Verify By hynky and 3 others • Feb 14 • 30
view article Article Open-source DeepResearch – Freeing our search agents By m-ric and 4 others • Feb 4 • 1.28k
view article Article CO₂ Emissions and Models Performance: Insights from the Open LLM Leaderboard By alozowski and 3 others • Jan 9 • 21
view article Article Rethinking LLM Evaluation with 3C3H: AraGen Benchmark and Leaderboard By alielfilali01 and 4 others • Dec 4, 2024 • 37
view article Article Letting Large Models Debate: The First Multilingual LLM Debate Competition By xuanricheng and 11 others • Nov 20, 2024 • 32
view article Article Judge Arena: Benchmarking LLMs as Evaluators By kaikaidai and 7 others • Nov 19, 2024 • 58
view article Article Introducing the Open FinLLM Leaderboard By QianqianXie1994 and 12 others • Oct 4, 2024 • 79
view article Article BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks By terryyz and 8 others • Jun 18, 2024 • 52
view article Article Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens tokens and 11 languages By Quent-01 and 9 others • May 24, 2024 • 27
view article Article CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models By r34p3r1321 and 15 others • May 24, 2024 • 22
view article Article Introducing the Open Arabic LLM Leaderboard By alielfilali01 and 4 others • May 14, 2024 • 97
view article Article Introducing the Open Leaderboard for Hebrew LLMs! By Shaltiel and 3 others • May 5, 2024 • 48