Forgetting Transformer: Softmax Attention with a Forget Gate Paper • 2503.02130 • Published 10 days ago • 26
Running 2.23k 2.23k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
nyu-dice-lab/allenai_WildChat-1M-Full-Qwen_Qwen2.5-72B-Instruct-lc Viewer • Updated Jan 2 • 806k • 21 • 1
WildChat-50m Collection All model responses associated with the WildChat-50m paper. • 55 items • Updated Jan 29 • 7