Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ Trained with compute from [Backyard.ai](https://backyard.ai/) | Thanks to them a
|
|
11 |
Fimbulvetr-v2 but extended to 16K with PoSE. A sane context value would be ~12K before it degrades.
|
12 |
|
13 |
Note:
|
14 |
-
<br> \- I left Rope Theta at 10K for this train, instead of expanding it like with Stheno 3.3. Solar did not play
|
15 |
|
16 |
---
|
17 |
|
|
|
11 |
Fimbulvetr-v2 but extended to 16K with PoSE. A sane context value would be ~12K before it degrades.
|
12 |
|
13 |
Note:
|
14 |
+
<br> \- I left Rope Theta at 10K for this train, instead of expanding it like with Stheno 3.3. Solar did not play well with extended theta, grad norm / loss values went parabolic or plunged from 10000+ down. Unreliable pretty much, unlike Stheno 3.3's training run.
|
15 |
|
16 |
---
|
17 |
|