Doctor-Shotgun
commited on
Commit
•
a65c7bb
1
Parent(s):
0f46524
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- togethercomputer/RedPajama-Data-1T-Sample
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
tags:
|
8 |
+
- llama
|
9 |
+
- llama 2
|
10 |
+
- smol_llama
|
11 |
+
---
|
12 |
+
# smol_llama-220M-GQA-32k-theta
|
13 |
+
|
14 |
+
Experimental model meant to serve as a long-context speculative decoding model.
|
15 |
+
|
16 |
+
Created using [BEE-spoke-data/smol_llama-220M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-220M-GQA) and further pretraining at 32768 context length on [togethercomputer/RedPajama-Data-1T-Sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample).
|
17 |
+
|
18 |
+
This variant uses the rope theta (rope frequency base) method for context extension.
|
19 |
+
|
20 |
+
Wikitext Perplexity (64 rows) as evaluated by [exllamav2](https://github.com/turboderp/exllamav2):
|
21 |
+
```
|
22 |
+
Base Model
|
23 |
+
2048: 20.2193
|
24 |
+
4096: 102.6928
|
25 |
+
8192: 235.5210
|
26 |
+
16384: 390.7198
|
27 |
+
32768: 515.8053
|
28 |
+
|
29 |
+
32k - Linear Rope Scale 16.0
|
30 |
+
2048: 25.7148
|
31 |
+
4096: 23.4461
|
32 |
+
8192: 22.3326
|
33 |
+
16384: 21.6744
|
34 |
+
32768: 21.4317
|
35 |
+
|
36 |
+
32k - Rope Theta 1000000.0
|
37 |
+
2048: 20.2158
|
38 |
+
4096: 18.3868
|
39 |
+
8192: 17.5976
|
40 |
+
16384: 17.1462
|
41 |
+
32768: 16.6989
|
42 |
+
```
|