Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts Paper • 2501.14334 • Published Jan 24 • 20
We Can't Understand AI Using our Existing Vocabulary Paper • 2502.07586 • Published 27 days ago • 10
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Paper • 2501.18427 • Published Jan 30 • 17
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 28 days ago • 142
AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting Paper • 2502.05176 • Published about 1 month ago • 32
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 200
TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets Paper • 2502.01506 • Published Feb 3 • 33
You Do Not Fully Utilize Transformer's Representation Capacity Paper • 2502.09245 • Published 25 days ago • 34
MoM: Linear Sequence Modeling with Mixture-of-Memories Paper • 2502.13685 • Published 19 days ago • 33
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper • 2501.18512 • Published Jan 30 • 27 • 7
Running 2.16k 2.16k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published Jan 28 • 36
DeepFlow: Serverless Large Language Model Serving at Scale Paper • 2501.14417 • Published Jan 24 • 3