Exposing Attention Glitches with Flip-Flop Language Modeling Paper • 2306.00946 • Published Jun 1, 2023 • 2
TinyGSM: achieving >80% on GSM8k with small language models Paper • 2312.09241 • Published Dec 14, 2023 • 37
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression Paper • 2306.00788 • Published Jun 1, 2023
Repeat After Me: Transformers are Better than State Space Models at Copying Paper • 2402.01032 • Published Feb 1 • 22
Task-Specific Skill Localization in Fine-tuned Language Models Paper • 2302.06600 • Published Feb 13, 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding Paper • 2303.04245 • Published Mar 7, 2023