Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,14 @@ cat mix4ns-0000* > mix4ns.gguf
|
|
17 |
```
|
18 |
careful this can take 5 minutes or up to 10-15 on slow instances, check progress with ls -la
|
19 |
|
|
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
# Perplexity benchmarks
|
22 |
|
23 |
Command I used to run these on 48 core CPU only machine, you can add -ngl 16 to offload 16 layers or more to gpu on your own.
|
@@ -54,13 +61,13 @@ perplexity regular q4_km (no imatrix): 108.59 seconds per pass
|
|
54 |
[1]2.6100,[2]3.1304,[3]3.6897,[4]3.3500,[5]2.8118,[6]2.5992,[7]2.4349,[8]2.3816,[9]2.4174,[10]2.3959,[11]2.3988,[12]2.3976,
|
55 |
Final estimate: PPL = 2.3976 +/- 0.07111
|
56 |
|
57 |
-
perplexity EdgeQuant iq4-ns
|
58 |
[1]2.7195,[2]3.1821,[3]3.7177,[4]3.3017,[5]2.8012,[6]2.6034,[7]2.4318,[8]2.3747,[9]2.4160,[10]2.3931,[11]2.4023,[12]2.4013,
|
59 |
Final estimate: PPL = 2.4013 +/- 0.07116
|
60 |
|
61 |
-
perplexity EdgeQuant iq4-ns
|
62 |
-
[1]2.
|
63 |
-
Final estimate: PPL = 2.
|
64 |
|
65 |
perplexity 2K (no imatrix) 207.70 seconds per pass - FILESIZE 47564MB (mix2k-noimatrix-but-usable-reference.gguf)
|
66 |
[1]2.9401,[2]3.4224,[3]4.0174,[4]3.8503,[5]3.5607,[6]3.4449,[7]3[9]3.5589,[10]3.6546,[11]3.7810,[12]3.7733,
|
@@ -72,5 +79,6 @@ Final estimate: PPL = 3.7733 +/- 0.13299
|
|
72 |
command to run these was:
|
73 |
```
|
74 |
./main -m mix4ns.gguf -n 256 -t 48 --temp 0.5 --color -p "How to build a city on mars via shipping through aldrin cycler orbits?"
|
|
|
75 |
```
|
76 |
|
|
|
17 |
```
|
18 |
careful this can take 5 minutes or up to 10-15 on slow instances, check progress with ls -la
|
19 |
|
20 |
+
# Run with llama.cpp
|
21 |
|
22 |
+
```
|
23 |
+
git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp/ && make -j
|
24 |
+
|
25 |
+
./main -m ~/mix4ns-00001-of-00005.gguf -n 256 -t 64 --temp 0.2 --color -p "How to build a city on mars via aldrin cycler orgbits?"
|
26 |
+
|
27 |
+
```
|
28 |
# Perplexity benchmarks
|
29 |
|
30 |
Command I used to run these on 48 core CPU only machine, you can add -ngl 16 to offload 16 layers or more to gpu on your own.
|
|
|
61 |
[1]2.6100,[2]3.1304,[3]3.6897,[4]3.3500,[5]2.8118,[6]2.5992,[7]2.4349,[8]2.3816,[9]2.4174,[10]2.3959,[11]2.3988,[12]2.3976,
|
62 |
Final estimate: PPL = 2.3976 +/- 0.07111
|
63 |
|
64 |
+
perplexity EdgeQuant iq4-ns (no imatrix) 84.45 seconds per pass - FILESIZE 77258 MB
|
65 |
[1]2.7195,[2]3.1821,[3]3.7177,[4]3.3017,[5]2.8012,[6]2.6034,[7]2.4318,[8]2.3747,[9]2.4160,[10]2.3931,[11]2.4023,[12]2.4013,
|
66 |
Final estimate: PPL = 2.4013 +/- 0.07116
|
67 |
|
68 |
+
perplexity EdgeQuant iq4-ns (WITH imatrix) 82.76 seconds per pass - FILESIZE 73636 MB ( mix4ns.gguf ) //BEST ONE FOR 80GB CARD
|
69 |
+
[1]2.7166,[2]3.1720,[3]3.6988,[4]3.3195,[5]2.7949,[6]2.5862,[7]2.4186,[8]2.3621,[9]2.3981,[10]2.3876,[11]2.3971,[12]2.3973,
|
70 |
+
Final estimate: PPL = 2.3973 +/- 0.07080
|
71 |
|
72 |
perplexity 2K (no imatrix) 207.70 seconds per pass - FILESIZE 47564MB (mix2k-noimatrix-but-usable-reference.gguf)
|
73 |
[1]2.9401,[2]3.4224,[3]4.0174,[4]3.8503,[5]3.5607,[6]3.4449,[7]3[9]3.5589,[10]3.6546,[11]3.7810,[12]3.7733,
|
|
|
79 |
command to run these was:
|
80 |
```
|
81 |
./main -m mix4ns.gguf -n 256 -t 48 --temp 0.5 --color -p "How to build a city on mars via shipping through aldrin cycler orbits?"
|
82 |
+
|
83 |
```
|
84 |
|