BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks Jun 18 • 36
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates Paper • 2410.07137 • Published 9 days ago • 6
ProX Dataset Collection a collection of pre-training corpora refined by ProX • 5 items • Updated 8 days ago • 5
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published 23 days ago • 59
MagpieLM Collection Aligning LMs with Fully Open Recipe (data+training configs+logs) • 9 items • Updated 25 days ago • 14
WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild Paper • 2409.03753 • Published Sep 5 • 18
OLMoE Collection Artifacts for open mixture-of-experts language models. • 13 items • Updated 23 days ago • 24
LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models Paper • 2409.00509 • Published Aug 31 • 38
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 85
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19 • 51
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12 • 115
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm Paper • 2408.08072 • Published Aug 15 • 31
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26 • 31
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23 • 67
view article Article ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models By yuchenlin • Jul 27 • 22
📈 Scaling Laws with Vocabulary Collection Increase your vocabulary size when you scale up your language model • 5 items • Updated Aug 11 • 3
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18 • 52
view article Article RegMix: Data Mixture as Regression for Language Model Pre-training By SivilTaram • Jul 11 • 10
view article Article MInference 1.0: 10x Faster Million Context Inference with a Single GPU By liyucheng • Jul 11 • 11
RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published Jul 1 • 33
🧬 RegMix: Data Mixture as Regression Collection Automatic data mixture method for large language model pre-training • 10 items • Updated Jul 26 • 5
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22 • 45
🌸BigCodeBench Collection Benchmarking Code Generation with Diverse Function Calls and Complex Instructions https://bigcode-bench.github.io/ • 7 items • Updated Jul 18 • 2
view article Article StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation Apr 29 • 71
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning Paper • 2406.11896 • Published Jun 14 • 18
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published Jun 17 • 57
view article Article BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks Jun 18 • 36
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17 • 48
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Paper • 2405.18392 • Published May 28 • 12
view article Article Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs Apr 16 • 14
view article Article Efficient Table Pre-training without Real Data: An Introduction to TAPEX May 23, 2022 • 1
DBRX Collection DBRX is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. • 3 items • Updated Mar 27 • 90
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20 • 60
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset Paper • 2403.09029 • Published Mar 14 • 54
MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing Paper • 2212.13492 • Published Dec 27, 2022 • 2
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding Paper • 2402.16671 • Published Feb 26 • 26
⚓️ Sailor Language Models Collection Sailor: Open Language Models tailored for South-East Asia (SEA) released by Sea AI Lab. • 18 items • Updated Jul 26 • 16
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 596
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15 • 21