Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published Dec 19, 2024 • 9
Consent in Crisis: The Rapid Decline of the AI Data Commons Paper • 2407.14933 • Published Jul 20, 2024 • 12
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22, 2024 • 46
Investigating Regularization of Self-Play Language Models Paper • 2404.04291 • Published Apr 4, 2024 • 1
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30, 2024 • 42
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning Paper • 2402.06619 • Published Feb 9, 2024 • 55
Solving The Travelling Salesmen Problem using HNN and HNN-SA algorithms Paper • 2202.13746 • Published Feb 8, 2022 • 1
Robustness and risk management via distributional dynamic programming Paper • 2112.15430 • Published Dec 28, 2021
Beyond Log-Concavity: Theory and Algorithm for Sum-Log-Concave Optimization Paper • 2309.15298 • Published Sep 26, 2023
BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing Paper • 2206.15076 • Published Jun 30, 2022 • 4
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 31
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset Paper • 2303.03915 • Published Mar 7, 2023 • 7