Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,54 @@
|
|
1 |
-
---
|
2 |
-
license: llama3.1
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: llama3.1
|
3 |
+
base_model: [mattshumer/Reflection-Llama-3.1-70B]
|
4 |
+
---
|
5 |
+
|
6 |
+
# High Precision quantization of π [Reflection-Llama-3.1-70B](https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B) π₯οΈ
|
7 |
+
# This gets 99.96% perplexity at 50gb filesize whereas fp8 (not tested on this model) is show to be 97-98.8%
|
8 |
+
|
9 |
+
|
10 |
+
|
11 |
+
|
12 |
+
>π§ To download faster on Linux `sudo apt install -y aria2`
|
13 |
+
>π On Mac `brew install aria2`
|
14 |
+
>
|
15 |
+
>These links will download 9x faster, feel free to paste them all in or one at a time
|
16 |
+
|
17 |
+
|
18 |
+
```verilog
|
19 |
+
aria2c -x 9 -o reflection-70b-precisequant-6bpw-00001-of-00002.gguf https://huggingface.co/nisten/Reflection-70b-PreciseQuant-6bpw-gguf/resolve/main/reflection-70b-precisequant-6bpw-00001-of-00002.gguf
|
20 |
+
|
21 |
+
aria2c -x 9 -o reflection-70b-precisequant-6bpw-00002-of-00002.gguf https://huggingface.co/nisten/Reflection-70b-PreciseQuant-6bpw-gguf/resolve/main/reflection-70b-precisequant-6bpw-00002-of-00002.gguf
|
22 |
+
```
|
23 |
+
|
24 |
+
### Prompt file with correct template
|
25 |
+
>π§ make a file called reflectionprompt.txt and just copy paste this in, change as needed
|
26 |
+
>
|
27 |
+
|
28 |
+
|
29 |
+
```bash
|
30 |
+
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
|
31 |
+
{You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.<|eot_id|><|start_header_id|>user<|end_header_id|>
|
32 |
+
}<|eot_id|><|start_header_id|>user<|end_header_id|>
|
33 |
+
{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
34 |
+
```
|
35 |
+
|
36 |
+
### To run the model on commandline terminal with multiline input find the location of the first 00001 gguf file then do
|
37 |
+
|
38 |
+
```bash
|
39 |
+
./llama-cli -ngl 81 -m reflection-70b-precisequant-6bpw-00001-of-00002.gguf -f reflectionprompt.txt --prompt-cache random.cache --keep -1 -fa -cnv -c 32000 -co -e -mli --temp 0 -ngl 99
|
40 |
+
```
|
41 |
+
|
42 |
+
## Perplexity benchmarks as you can see accuracy of the quant is 5.2416/5.2468 = 99.96% +-0.02%
|
43 |
+
|
44 |
+
```verilog
|
45 |
+
Float16 -143GB - perplexity: calculating perplexity over 64 chunks, n_ctx=512, batch_size=2048, n_seq=4
|
46 |
+
16.92 seconds per pass - ETA 4.50 minutes
|
47 |
+
[1]4.0486,[2]4.6471,[3]3.9394,[4]3.4698,[5]3.2290,[6]3.0391,[7]3.1640,[8]3.1819,[9]3.2073,[10]3.3374,[11]3.5247,[12]3.7371,[13]3.9944,[14]4.0065,[15]4.1234,[16]4.1503,[17]4.2893,[18]4.4968,[19]4.4347,[20]4.4439,[21]4.5403,[22]4.4419,[23]4.2888,[24]4.2224,[25]4.1259,[26]4.0495,[27]4.0324,[28]4.0221,[29]4.0838,[30]4.1170,[31]4.1588,[32]4.1664,[33]4.2095,[34]4.2723,[35]4.3194,[36]4.4006,[37]4.4192,[38]4.4598,[39]4.4861,[40]4.5294,[41]4.5674,[42]4.5571,[43]4.6098,[44]4.6025,[45]4.7148,[46]4.7590,[47]4.7303,[48]4.6854,[49]4.6778,[50]4.7118,[51]4.7762,[52]4.7682,[53]4.8604,[54]4.8778,[55]4.9023,[56]4.9398,[57]4.9594,[58]4.9813,[59]4.9653,[60]5.0095,[61]5.0626,[62]5.1179,[63]5.1774,[64]5.2416,
|
48 |
+
Final estimate: PPL = 5.2416 +/- 0.09238
|
49 |
+
|
50 |
+
6bpw - 50GB - perplexity: calculating perplexity over 64 chunks, n_ctx=512, batch_size=2048, n_seq=4
|
51 |
+
perplexity: 23.59 seconds per pass - ETA 6.28 minutes
|
52 |
+
[1]4.0767,[2]4.6657,[3]3.9513,[4]3.4823,[5]3.2487,[6]3.0724,[7]3.1902,[8]3.2125,[9]3.2384,[10]3.3744,[11]3.5567,[12]3.7686,[13]4.0223,[14]4.0309,[15]4.1456,[16]4.1740,[17]4.3123,[18]4.5194,[19]4.4535,[20]4.4623,[21]4.5580,[22]4.4580,[23]4.3051,[24]4.2390,[25]4.1393,[26]4.0586,[27]4.0414,[28]4.0307,[29]4.0909,[30]4.1243,[31]4.1653,[32]4.1725,[33]4.2153,[34]4.2791,[35]4.3258,[36]4.4072,[37]4.4263,[38]4.4676,[39]4.4944,[40]4.5377,[41]4.5755,[42]4.5648,[43]4.6176,[44]4.6105,[45]4.7227,[46]4.7669,[47]4.7393,[48]4.6918,[49]4.6836,[50]4.7175,[51]4.7818,[52]4.7738,[53]4.8659,[54]4.8834,[55]4.9086,[56]4.9452,[57]4.9649,[58]4.9874,[59]4.9718,[60]5.0159,[61]5.0686,[62]5.1238,[63]5.1833,[64]5.2468,
|
53 |
+
Final estimate: PPL = 5.2468 +/- 0.09258
|
54 |
+
```
|