Update README.md
Browse files
README.md
CHANGED
@@ -98,8 +98,8 @@ think correctly. A highly degraded quant like `Q2_K` may not make a
|
|
98 |
great encyclopedia, but it's still capable of logical reasoning and
|
99 |
the emergent capabilities LLMs exhibit.
|
100 |
|
101 |
-
Good quants for reading (evaluation speed) are BF16, F16,
|
102 |
-
|
103 |
computation speed (flops) which means performance can be improved by
|
104 |
software engineering, e.g. BLAS algorithms, in which case quantization
|
105 |
starts hurting more than it helps, since it competes for CPU resources
|
|
|
98 |
great encyclopedia, but it's still capable of logical reasoning and
|
99 |
the emergent capabilities LLMs exhibit.
|
100 |
|
101 |
+
Good quants for reading (evaluation speed) are BF16, F16, Q8\_0, and
|
102 |
+
Q4\_0 (ordered from fastest to slowest). Prompt evaluation is bounded by
|
103 |
computation speed (flops) which means performance can be improved by
|
104 |
software engineering, e.g. BLAS algorithms, in which case quantization
|
105 |
starts hurting more than it helps, since it competes for CPU resources
|