TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation Paper • 2003.11963 • Published Mar 26, 2020
BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model Paper • 2212.04960 • Published Dec 9, 2022 • 1
Continuous Learning in a Hierarchical Multiscale Neural Network Paper • 1805.05758 • Published May 15, 2018 • 1
HuggingFace's Transformers: State-of-the-art Natural Language Processing Paper • 1910.03771 • Published Oct 9, 2019 • 16
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements Paper • 2210.01970 • Published Sep 30, 2022 • 11
TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents Paper • 1901.08149 • Published Jan 23, 2019 • 3
Datasets: A Community Library for Natural Language Processing Paper • 2109.02846 • Published Sep 7, 2021 • 10
Large Language Models Can Self-Improve in Long-context Reasoning Paper • 2411.08147 • Published 4 days ago • 44