Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training Paper • 2503.02844 • Published 7 days ago • 1
μLO: Compute-Efficient Meta-Generalization of Learned Optimizers Paper • 2406.00153 • Published May 31, 2024 • 12