Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published 4 days ago • 72
Position Information Emerges in Causal Transformers Without Positional Encodings via Similarity of Nearby Embeddings Paper • 2501.00073 • Published 14 days ago • 1
Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models Paper • 2412.07171 • Published Dec 10, 2024 • 1
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding Paper • 2501.00712 • Published 12 days ago • 5
Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing Paper • 2501.00658 • Published 12 days ago • 7
Revisiting In-Context Learning with Long Context Language Models Paper • 2412.16926 • Published 22 days ago • 28
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published 20 days ago • 29
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published 20 days ago • 39
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published Nov 26, 2024 • 48
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference Paper • 2403.09636 • Published Mar 14, 2024 • 2
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29, 2024 • 49
Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study Paper • 2401.17981 • Published Jan 31, 2024 • 1
What Algorithms can Transformers Learn? A Study in Length Generalization Paper • 2310.16028 • Published Oct 24, 2023 • 2
Empower Your Model with Longer and Better Context Comprehension Paper • 2307.13365 • Published Jul 25, 2023 • 1