Sao10K commited on
Commit
bd34d85
·
verified ·
1 Parent(s): 1aad7aa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -5
README.md CHANGED
@@ -1,18 +1,33 @@
 
 
 
 
 
 
 
 
 
1
 
2
  Training Details:
3
- Trained at 8K Context -> Expanded to 32K Context due to context extension with PoSE training.
4
 
5
  Dataset Modifications:
6
- - Further Cleaned up Roleplaying Samples -> Quality Check
7
- - Removed Low Quality Samples from Manual Check
8
- - More Creative Writing Samples -> 2x
9
- - Remade and Refined Detailed Instruct Data
10
 
11
  Needle in a Haystack Results:
12
  ![Results](Linkhere)
13
 
14
  Coherent at 32K Context. Not as good as a natively trained 32K model, but much better than regular rope scaling.
15
 
 
 
 
 
 
 
16
  ```
17
  sequence_len: 8192
18
  use_pose: true
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ ---
6
+
7
+ Trained with compute from [Backyard.ai](https://backyard.ai/) | Thanks to them and @dynafire for helping me out.
8
+
9
+ ---
10
 
11
  Training Details:
12
+ <br>\Trained at 8K Context -> Expanded to 32K Context due to context extension with PoSE training.
13
 
14
  Dataset Modifications:
15
+ <br>\- Further Cleaned up Roleplaying Samples -> Quality Check
16
+ <br>\- Removed Low Quality Samples from Manual Check
17
+ <br>\- More Creative Writing Samples -> 2x
18
+ <br>\- Remade and Refined Detailed Instruct Data
19
 
20
  Needle in a Haystack Results:
21
  ![Results](Linkhere)
22
 
23
  Coherent at 32K Context. Not as good as a natively trained 32K model, but much better than regular rope scaling.
24
 
25
+ ---
26
+
27
+ Relevant Axolotl Configurations:
28
+ <br>-> Taken from [winglian/Llama-3-8b-64k-PoSE](https://huggingface.co/winglian/Llama-3-8b-64k-PoSE)
29
+ <br>\- I tried to find my own configs, but his worked best. 2M Theta had the best loss results during training compared to other values.
30
+
31
  ```
32
  sequence_len: 8192
33
  use_pose: true