Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing Paper • 2501.00658 • Published 13 days ago • 7
Nested Attention: Semantic-aware Attention Values for Concept Personalization Paper • 2501.01407 • Published 11 days ago • 10
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 88