Sao10K
/

L3-8B-Stheno-v3.3-32K

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Sao10K commited on Jun 22, 2024

Commit

bd34d85

·

verified ·

1 Parent(s): 1aad7aa

Update README.md

Files changed (1) hide show

README.md +20 -5

README.md CHANGED Viewed

@@ -1,18 +1,33 @@
 Training Details:
-Trained at 8K Context -> Expanded to 32K Context due to context extension with PoSE training.
 Dataset Modifications:
-- Further Cleaned up Roleplaying Samples -> Quality Check
-- Removed Low Quality Samples from Manual Check
-- More Creative Writing Samples -> 2x
-- Remade and Refined Detailed Instruct Data
 Needle in a Haystack Results:
 ![Results](Linkhere)
 Coherent at 32K Context. Not as good as a natively trained 32K model, but much better than regular rope scaling.
 ```
 sequence_len: 8192
 use_pose: true

+---
+license: cc-by-nc-4.0
+language:
+- en
+---
+Trained with compute from [Backyard.ai](https://backyard.ai/) | Thanks to them and @dynafire for helping me out.
+---
 Training Details:
+<br>\Trained at 8K Context -> Expanded to 32K Context due to context extension with PoSE training.
 Dataset Modifications:
+<br>\- Further Cleaned up Roleplaying Samples -> Quality Check
+<br>\- Removed Low Quality Samples from Manual Check
+<br>\- More Creative Writing Samples -> 2x
+<br>\- Remade and Refined Detailed Instruct Data
 Needle in a Haystack Results:
 ![Results](Linkhere)
 Coherent at 32K Context. Not as good as a natively trained 32K model, but much better than regular rope scaling.
+---
+Relevant Axolotl Configurations:
+<br>-> Taken from [winglian/Llama-3-8b-64k-PoSE](https://huggingface.co/winglian/Llama-3-8b-64k-PoSE)
+<br>\- I tried to find my own configs, but his worked best. 2M Theta had the best loss results during training compared to other values.
 ```
 sequence_len: 8192
 use_pose: true