DavidAU commited on
Commit
836d908
·
verified ·
1 Parent(s): 7aafd93

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -94,7 +94,7 @@ Example prompts and outputs below, including examples using a pre prompt.
94
  - "MAX": output tensor / embed at float 32. You get better instruction following/output generation than standard/upgraded quants.
95
  - "MAX-CPU": output tensor / embed at bfloat 16 (required for Gemma models/CPU offload), which forces both of these on to the CPU (Nvidia cards / other will vary), this frees up vram at cost of token/second and you get better instruction following/output generation too.
96
  - "MAX-CPU": Example 1: q8_0 Max-CPU : 1.7 GB will load on to CPU/RAM, 8.5 GB will load onto the GPU/vram. Extra Vram can be used for context. NOTE: "Math" on the CPU is slightly more accurate than GPU, so you may get a better generation.
97
- - "MAX-CPU": Example 2: q2_k Max-CPU : 1.7 GB mb will load on to CPU/RAM, 3 GB will load onto the GPU/vram. Extra Vram can be used for context. NOTE: "Math" on the CPU is slightly more accurate than GPU, so you may get a better generation. You could run this model/quant on a 4GB vram card.
98
  - Q8_0 (Max,Max-CPU) now clocks in at 10.83 bits per weight (average).
99
 
100
  <B>Settings, Quants and Critical Operations Notes:</b>
 
94
  - "MAX": output tensor / embed at float 32. You get better instruction following/output generation than standard/upgraded quants.
95
  - "MAX-CPU": output tensor / embed at bfloat 16 (required for Gemma models/CPU offload), which forces both of these on to the CPU (Nvidia cards / other will vary), this frees up vram at cost of token/second and you get better instruction following/output generation too.
96
  - "MAX-CPU": Example 1: q8_0 Max-CPU : 1.7 GB will load on to CPU/RAM, 8.5 GB will load onto the GPU/vram. Extra Vram can be used for context. NOTE: "Math" on the CPU is slightly more accurate than GPU, so you may get a better generation.
97
+ - "MAX-CPU": Example 2: q2_k Max-CPU : 1.7 GB mb will load on to CPU/RAM, 3 GB will load onto the GPU/vram. Extra Vram can be used for context. NOTE: "Math" on the CPU is slightly more accurate than GPU, so you may get a better generation. You could run this model/quant on a 8GB vram card.
98
  - Q8_0 (Max,Max-CPU) now clocks in at 10.83 bits per weight (average).
99
 
100
  <B>Settings, Quants and Critical Operations Notes:</b>