jartine commited on
Commit
5ad53be
1 Parent(s): d648638

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -100,8 +100,8 @@ the emergent capabilities LLMs exhibit.
100
 
101
  Good quants for reading (evaluation speed) are BF16, F16, Q8\_0, and
102
  Q4\_0 (ordered from fastest to slowest). Prompt evaluation is bounded by
103
- computation speed (flops) which means performance can be improved by
104
- software engineering, e.g. BLAS algorithms, in which case quantization
105
  starts hurting more than it helps, since it competes for CPU resources
106
  and makes it harder for the compiler to parallelize instructions. You
107
  want to ideally use the simplest smallest floating point format that's
 
100
 
101
  Good quants for reading (evaluation speed) are BF16, F16, Q8\_0, and
102
  Q4\_0 (ordered from fastest to slowest). Prompt evaluation is bounded by
103
+ flop count, which means perf can be improved through software
104
+ engineering alone, e.g. BLAS algorithms, in which case quantization
105
  starts hurting more than it helps, since it competes for CPU resources
106
  and makes it harder for the compiler to parallelize instructions. You
107
  want to ideally use the simplest smallest floating point format that's