Memory-Efficient LLM Training with Online Subspace Descent Paper • 2408.12857 • Published Aug 23 • 11 • 3
LongNet: Scaling Transformers to 1,000,000,000 Tokens Paper • 2307.02486 • Published Jul 5, 2023 • 80 • 15