Update README.md
Browse files
README.md
CHANGED
@@ -21,3 +21,5 @@ With 4 experts activated, it's... far less coherent.
|
|
21 |
|
22 |
I am interested in the prospect of continuing to train this in such a way where it can naturally handle variable expert counts, and learn to balance the features.
|
23 |
If this works, we can potentially teach the behavior of using less computation for tokens that are trivial to predict, while using more when necessary.
|
|
|
|
|
|
21 |
|
22 |
I am interested in the prospect of continuing to train this in such a way where it can naturally handle variable expert counts, and learn to balance the features.
|
23 |
If this works, we can potentially teach the behavior of using less computation for tokens that are trivial to predict, while using more when necessary.
|
24 |
+
|
25 |
+
# Also thanks StefanGliga for giving me the idea while we were discussing [this paper](https://arxiv.org/abs/2303.01610) :3
|