Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
yicui
's Collections
Mechanistic
Coding
Benchmark
Training
ICL
Architecture
RL
TDD
Theory
Instructions
Mechanistic
updated
Nov 22
Upvote
-
Massive Activations in Large Language Models
Paper
•
2402.17762
•
Published
Feb 27
•
1
What Matters in Transformers? Not All Attention is Needed
Paper
•
2406.15786
•
Published
Jun 22
•
29
The Super Weight in Large Language Models
Paper
•
2411.07191
•
Published
Nov 11
•
4
Top-nσ: Not All Logits Are You Need
Paper
•
2411.07641
•
Published
Nov 12
•
18
Upvote
-
Share collection
View history
Collection guide
Browse collections