The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training Paper • 2501.18965 • Published 5 days ago • 5
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Paper • 2501.18427 • Published 6 days ago • 14
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper • 2501.18512 • Published 6 days ago • 24
People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text Paper • 2501.15654 • Published 10 days ago • 9
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published 7 days ago • 49
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published 8 days ago • 21
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation Paper • 2501.16764 • Published 8 days ago • 21
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation Paper • 2501.15907 • Published 9 days ago • 15
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper • 2501.15570 • Published 10 days ago • 23
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper • 2501.13926 • Published 13 days ago • 33
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding Paper • 2501.13200 • Published 13 days ago • 61
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos Paper • 2501.12375 • Published 15 days ago • 22
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation Paper • 2501.12202 • Published 15 days ago • 32
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Paper • 2501.12368 • Published 15 days ago • 39