Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings Paper • 2305.13571 • Published May 23, 2023 • 2
Reasoning in Large Language Models: A Geometric Perspective Paper • 2407.02678 • Published Jul 2, 2024 • 1