jmagder
's Collections
Finished Reading
updated
Self-Play Preference Optimization for Language Model Alignment
Paper
•
2405.00675
•
Published
•
25
FlashAttention: Fast and Memory-Efficient Exact Attention with
IO-Awareness
Paper
•
2205.14135
•
Published
•
11
Attention Is All You Need
Paper
•
1706.03762
•
Published
•
49
FlashAttention-2: Faster Attention with Better Parallelism and Work
Partitioning
Paper
•
2307.08691
•
Published
•
8
FlashAttention-3: Fast and Accurate Attention with Asynchrony and
Low-precision
Paper
•
2407.08608
•
Published
•
1
Efficient Transformers: A Survey
Paper
•
2009.06732
•
Published
•
1
Linformer: Self-Attention with Linear Complexity
Paper
•
2006.04768
•
Published
•
2
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
•
2402.13753
•
Published
•
112
YaRN: Efficient Context Window Extension of Large Language Models
Paper
•
2309.00071
•
Published
•
65
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper
•
2104.09864
•
Published
•
11
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper
•
2310.11453
•
Published
•
96
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
•
2402.17764
•
Published
•
604
LLaMA: Open and Efficient Foundation Language Models
Paper
•
2302.13971
•
Published
•
13
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
•
2307.09288
•
Published
•
243
Training Compute-Optimal Large Language Models
Paper
•
2203.15556
•
Published
•
10
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head
Checkpoints
Paper
•
2305.13245
•
Published
•
5