Update readme: improvements (#18)
Browse files- Update readme: improvements (78ffd748ac052c1791af0152ade1bf040d1156f5)
README.md
CHANGED
@@ -13,7 +13,10 @@ license: llama3
|
|
13 |
Gradient incorporates your data to deploy autonomous assistants that power critical operations across your business. To learn more or collaborate on a custom model, drop us a message at [email protected].
|
14 |
|
15 |
This model extends LLama-3 8B's context length from 8k to > 160K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens) by appropriately adjusting RoPE theta.
|
16 |
-
|
|
|
|
|
|
|
17 |
|
18 |
**Approach:**
|
19 |
|
@@ -38,7 +41,7 @@ Exl2 is available on Bullerwins's huggingface account. Check it out here:
|
|
38 |
|
39 |
**Data:**
|
40 |
|
41 |
-
For training data, we generate long contexts by augmenting [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
|
42 |
|
43 |
**Progressive Training Details:**
|
44 |
|
|
|
13 |
Gradient incorporates your data to deploy autonomous assistants that power critical operations across your business. To learn more or collaborate on a custom model, drop us a message at [email protected].
|
14 |
|
15 |
This model extends LLama-3 8B's context length from 8k to > 160K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens) by appropriately adjusting RoPE theta.
|
16 |
+
|
17 |
+
**Update (5/3): We further fine-tuned our model to strengthen its assistant-like chat ability as well. The NIAH result is updated.**
|
18 |
+
|
19 |
+

|
20 |
|
21 |
**Approach:**
|
22 |
|
|
|
41 |
|
42 |
**Data:**
|
43 |
|
44 |
+
For training data, we generate long contexts by augmenting [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B). We also fine-tune on a chat dataset based on UltraChat [4], following a similar recipe for data augmentation to [2].
|
45 |
|
46 |
**Progressive Training Details:**
|
47 |
|