Update README.md
Browse files
README.md
CHANGED
@@ -1,18 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
2 |
Training Details:
|
3 |
-
Trained at 8K Context -> Expanded to 32K Context due to context extension with PoSE training.
|
4 |
|
5 |
Dataset Modifications:
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
|
11 |
Needle in a Haystack Results:
|
12 |
![Results](Linkhere)
|
13 |
|
14 |
Coherent at 32K Context. Not as good as a natively trained 32K model, but much better than regular rope scaling.
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
```
|
17 |
sequence_len: 8192
|
18 |
use_pose: true
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
---
|
6 |
+
|
7 |
+
Trained with compute from [Backyard.ai](https://backyard.ai/) | Thanks to them and @dynafire for helping me out.
|
8 |
+
|
9 |
+
---
|
10 |
|
11 |
Training Details:
|
12 |
+
<br>\Trained at 8K Context -> Expanded to 32K Context due to context extension with PoSE training.
|
13 |
|
14 |
Dataset Modifications:
|
15 |
+
<br>\- Further Cleaned up Roleplaying Samples -> Quality Check
|
16 |
+
<br>\- Removed Low Quality Samples from Manual Check
|
17 |
+
<br>\- More Creative Writing Samples -> 2x
|
18 |
+
<br>\- Remade and Refined Detailed Instruct Data
|
19 |
|
20 |
Needle in a Haystack Results:
|
21 |
![Results](Linkhere)
|
22 |
|
23 |
Coherent at 32K Context. Not as good as a natively trained 32K model, but much better than regular rope scaling.
|
24 |
|
25 |
+
---
|
26 |
+
|
27 |
+
Relevant Axolotl Configurations:
|
28 |
+
<br>-> Taken from [winglian/Llama-3-8b-64k-PoSE](https://huggingface.co/winglian/Llama-3-8b-64k-PoSE)
|
29 |
+
<br>\- I tried to find my own configs, but his worked best. 2M Theta had the best loss results during training compared to other values.
|
30 |
+
|
31 |
```
|
32 |
sequence_len: 8192
|
33 |
use_pose: true
|