-
Massive Activations in Large Language Models
Paper • 2402.17762 • Published • 1 -
What Matters in Transformers? Not All Attention is Needed
Paper • 2406.15786 • Published • 29 -
The Super Weight in Large Language Models
Paper • 2411.07191 • Published • 4 -
Top-nσ: Not All Logits Are You Need
Paper • 2411.07641 • Published • 18
Yi Cui
yicui
·
AI & ML interests
None yet
Recent Activity
updated
a collection
about 1 month ago
Mechanistic
updated
a collection
about 1 month ago
Training
updated
a collection
about 1 month ago
Training
Organizations
None yet
Collections
10
-
glaiveai/glaive-coder-7b
Text Generation • Updated • 818 • 54 -
glaiveai/glaive-code-assistant-v3
Viewer • Updated • 950k • 426 • 46 -
ibm-granite/granite-3b-code-base-128k
Text Generation • Updated • 1.29k • 4 -
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Paper • 2405.04324 • Published • 22