nisten commited on
Commit
80a56ca
β€’
1 Parent(s): 82f83e5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -3
README.md CHANGED
@@ -1,3 +1,54 @@
1
- ---
2
- license: llama3.1
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.1
3
+ base_model: [mattshumer/Reflection-Llama-3.1-70B]
4
+ ---
5
+
6
+ # High Precision quantization of πŸš€ [Reflection-Llama-3.1-70B](https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B) πŸ–₯️
7
+ # This gets 99.96% perplexity at 50gb filesize whereas fp8 (not tested on this model) is show to be 97-98.8%
8
+
9
+
10
+
11
+
12
+ >🐧 To download faster on Linux `sudo apt install -y aria2`
13
+ >🍎 On Mac `brew install aria2`
14
+ >
15
+ >These links will download 9x faster, feel free to paste them all in or one at a time
16
+
17
+
18
+ ```verilog
19
+ aria2c -x 9 -o reflection-70b-precisequant-6bpw-00001-of-00002.gguf https://huggingface.co/nisten/Reflection-70b-PreciseQuant-6bpw-gguf/resolve/main/reflection-70b-precisequant-6bpw-00001-of-00002.gguf
20
+
21
+ aria2c -x 9 -o reflection-70b-precisequant-6bpw-00002-of-00002.gguf https://huggingface.co/nisten/Reflection-70b-PreciseQuant-6bpw-gguf/resolve/main/reflection-70b-precisequant-6bpw-00002-of-00002.gguf
22
+ ```
23
+
24
+ ### Prompt file with correct template
25
+ >🐧 make a file called reflectionprompt.txt and just copy paste this in, change as needed
26
+ >
27
+
28
+
29
+ ```bash
30
+ <|begin_of_text|><|start_header_id|>system<|end_header_id|>
31
+ {You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.<|eot_id|><|start_header_id|>user<|end_header_id|>
32
+ }<|eot_id|><|start_header_id|>user<|end_header_id|>
33
+ {prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
34
+ ```
35
+
36
+ ### To run the model on commandline terminal with multiline input find the location of the first 00001 gguf file then do
37
+
38
+ ```bash
39
+ ./llama-cli -ngl 81 -m reflection-70b-precisequant-6bpw-00001-of-00002.gguf -f reflectionprompt.txt --prompt-cache random.cache --keep -1 -fa -cnv -c 32000 -co -e -mli --temp 0 -ngl 99
40
+ ```
41
+
42
+ ## Perplexity benchmarks as you can see accuracy of the quant is 5.2416/5.2468 = 99.96% +-0.02%
43
+
44
+ ```verilog
45
+ Float16 -143GB - perplexity: calculating perplexity over 64 chunks, n_ctx=512, batch_size=2048, n_seq=4
46
+ 16.92 seconds per pass - ETA 4.50 minutes
47
+ [1]4.0486,[2]4.6471,[3]3.9394,[4]3.4698,[5]3.2290,[6]3.0391,[7]3.1640,[8]3.1819,[9]3.2073,[10]3.3374,[11]3.5247,[12]3.7371,[13]3.9944,[14]4.0065,[15]4.1234,[16]4.1503,[17]4.2893,[18]4.4968,[19]4.4347,[20]4.4439,[21]4.5403,[22]4.4419,[23]4.2888,[24]4.2224,[25]4.1259,[26]4.0495,[27]4.0324,[28]4.0221,[29]4.0838,[30]4.1170,[31]4.1588,[32]4.1664,[33]4.2095,[34]4.2723,[35]4.3194,[36]4.4006,[37]4.4192,[38]4.4598,[39]4.4861,[40]4.5294,[41]4.5674,[42]4.5571,[43]4.6098,[44]4.6025,[45]4.7148,[46]4.7590,[47]4.7303,[48]4.6854,[49]4.6778,[50]4.7118,[51]4.7762,[52]4.7682,[53]4.8604,[54]4.8778,[55]4.9023,[56]4.9398,[57]4.9594,[58]4.9813,[59]4.9653,[60]5.0095,[61]5.0626,[62]5.1179,[63]5.1774,[64]5.2416,
48
+ Final estimate: PPL = 5.2416 +/- 0.09238
49
+
50
+ 6bpw - 50GB - perplexity: calculating perplexity over 64 chunks, n_ctx=512, batch_size=2048, n_seq=4
51
+ perplexity: 23.59 seconds per pass - ETA 6.28 minutes
52
+ [1]4.0767,[2]4.6657,[3]3.9513,[4]3.4823,[5]3.2487,[6]3.0724,[7]3.1902,[8]3.2125,[9]3.2384,[10]3.3744,[11]3.5567,[12]3.7686,[13]4.0223,[14]4.0309,[15]4.1456,[16]4.1740,[17]4.3123,[18]4.5194,[19]4.4535,[20]4.4623,[21]4.5580,[22]4.4580,[23]4.3051,[24]4.2390,[25]4.1393,[26]4.0586,[27]4.0414,[28]4.0307,[29]4.0909,[30]4.1243,[31]4.1653,[32]4.1725,[33]4.2153,[34]4.2791,[35]4.3258,[36]4.4072,[37]4.4263,[38]4.4676,[39]4.4944,[40]4.5377,[41]4.5755,[42]4.5648,[43]4.6176,[44]4.6105,[45]4.7227,[46]4.7669,[47]4.7393,[48]4.6918,[49]4.6836,[50]4.7175,[51]4.7818,[52]4.7738,[53]4.8659,[54]4.8834,[55]4.9086,[56]4.9452,[57]4.9649,[58]4.9874,[59]4.9718,[60]5.0159,[61]5.0686,[62]5.1238,[63]5.1833,[64]5.2468,
53
+ Final estimate: PPL = 5.2468 +/- 0.09258
54
+ ```