Lewdiculous
commited on
Commit
•
1b5d05a
1
Parent(s):
bbe10e2
Update README.md
Browse files
README.md
CHANGED
@@ -24,9 +24,9 @@ My GGUF-IQ-Imatrix quants for [**Nitral-AI/Poppy_Porpoise-0.85-L3-8B**](https://
|
|
24 |
> [!NOTE]
|
25 |
> **General usage:** <br>
|
26 |
> Use the latest version of **KoboldCpp**. <br>
|
|
|
27 |
> For **8GB VRAM** GPUs, I recommend the **Q4_K_M-imat** quant for up to 12288 context sizes. <br>
|
28 |
> For **12GB VRAM** GPUs, the **Q5_K_M-imat** quant will give you a great size/quality balance. <br>
|
29 |
-
> Remember that you can also use `--flashattention` on KoboldCpp now even with non-RTX cards for reduced VRAM usage.
|
30 |
>
|
31 |
> **Resources:** <br>
|
32 |
> You can find out more about how each quant stacks up against each other and their types [**here**](gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9) and [**here**](https://rentry.org/llama-cpp-quants-or-fine-ill-do-it-myself-then-pt-2), respectively.
|
|
|
24 |
> [!NOTE]
|
25 |
> **General usage:** <br>
|
26 |
> Use the latest version of **KoboldCpp**. <br>
|
27 |
+
> Remember that you can also use `--flashattention` on KoboldCpp now even with non-RTX cards for reduced VRAM usage. <br>
|
28 |
> For **8GB VRAM** GPUs, I recommend the **Q4_K_M-imat** quant for up to 12288 context sizes. <br>
|
29 |
> For **12GB VRAM** GPUs, the **Q5_K_M-imat** quant will give you a great size/quality balance. <br>
|
|
|
30 |
>
|
31 |
> **Resources:** <br>
|
32 |
> You can find out more about how each quant stacks up against each other and their types [**here**](gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9) and [**here**](https://rentry.org/llama-cpp-quants-or-fine-ill-do-it-myself-then-pt-2), respectively.
|