Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ inference: false
|
|
9 |
- Based on version 1.1
|
10 |
- Used PR "More accurate Q4_0 and Q4_1 quantizations #896" (should be closer in quality to unquantized)
|
11 |
- Choosing between q4_0 and q4_1, the logic of higher number \= better does not apply. If you are confused, stick with q4_0.
|
12 |
-
- If you performance to spare, it might be worth getting the q4_1. It's ~20% slower and requires 1GB more RAM, but has a ~5% lower perplexity, which is good for generation quality. You're not gonna notice it though.
|
13 |
- If you have *lots* of performance to spare, [TheBloke's conversion](https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g-GGML) is maybe ~7% better in perplexity but ~50% slower and requires 2GB more RAM.
|
14 |
|
15 |
|
|
|
9 |
- Based on version 1.1
|
10 |
- Used PR "More accurate Q4_0 and Q4_1 quantizations #896" (should be closer in quality to unquantized)
|
11 |
- Choosing between q4_0 and q4_1, the logic of higher number \= better does not apply. If you are confused, stick with q4_0.
|
12 |
+
- If you have performance to spare, it might be worth getting the q4_1. It's ~20% slower and requires 1GB more RAM, but has a ~5% lower perplexity, which is good for generation quality. You're not gonna notice it though.
|
13 |
- If you have *lots* of performance to spare, [TheBloke's conversion](https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g-GGML) is maybe ~7% better in perplexity but ~50% slower and requires 2GB more RAM.
|
14 |
|
15 |
|