Text Generation
GGUF
English
creative
creative writing
fiction writing
plot generation
sub-plot generation
story generation
scene continue
storytelling
fiction story
science fiction
romance
all genres
uncensored
story
writing
vivid prosing
vivid writing
fiction
roleplaying
bfloat16
swearing
rp
horror
gemma
mergekit
Not-For-All-Audiences
Inference Endpoints
conversational
Update README.md
Browse files
README.md
CHANGED
@@ -94,7 +94,7 @@ Example prompts and outputs below, including examples using a pre prompt.
|
|
94 |
- "MAX": output tensor / embed at float 32. You get better instruction following/output generation than standard/upgraded quants.
|
95 |
- "MAX-CPU": output tensor / embed at bfloat 16 (required for Gemma models/CPU offload), which forces both of these on to the CPU (Nvidia cards / other will vary), this frees up vram at cost of token/second and you get better instruction following/output generation too.
|
96 |
- "MAX-CPU": Example 1: q8_0 Max-CPU : 1.7 GB will load on to CPU/RAM, 8.5 GB will load onto the GPU/vram. Extra Vram can be used for context. NOTE: "Math" on the CPU is slightly more accurate than GPU, so you may get a better generation.
|
97 |
-
- "MAX-CPU": Example 2: q2_k Max-CPU : 1.7 GB mb will load on to CPU/RAM, 3 GB will load onto the GPU/vram. Extra Vram can be used for context. NOTE: "Math" on the CPU is slightly more accurate than GPU, so you may get a better generation. You could run this model/quant on a
|
98 |
- Q8_0 (Max,Max-CPU) now clocks in at 10.83 bits per weight (average).
|
99 |
|
100 |
<B>Settings, Quants and Critical Operations Notes:</b>
|
|
|
94 |
- "MAX": output tensor / embed at float 32. You get better instruction following/output generation than standard/upgraded quants.
|
95 |
- "MAX-CPU": output tensor / embed at bfloat 16 (required for Gemma models/CPU offload), which forces both of these on to the CPU (Nvidia cards / other will vary), this frees up vram at cost of token/second and you get better instruction following/output generation too.
|
96 |
- "MAX-CPU": Example 1: q8_0 Max-CPU : 1.7 GB will load on to CPU/RAM, 8.5 GB will load onto the GPU/vram. Extra Vram can be used for context. NOTE: "Math" on the CPU is slightly more accurate than GPU, so you may get a better generation.
|
97 |
+
- "MAX-CPU": Example 2: q2_k Max-CPU : 1.7 GB mb will load on to CPU/RAM, 3 GB will load onto the GPU/vram. Extra Vram can be used for context. NOTE: "Math" on the CPU is slightly more accurate than GPU, so you may get a better generation. You could run this model/quant on a 8GB vram card.
|
98 |
- Q8_0 (Max,Max-CPU) now clocks in at 10.83 bits per weight (average).
|
99 |
|
100 |
<B>Settings, Quants and Critical Operations Notes:</b>
|