Update README.md
Browse files
README.md
CHANGED
@@ -61,7 +61,7 @@ input
|
|
61 |
### Response:
|
62 |
```
|
63 |
|
64 |
-
## How to easily download and use this model in text-generation-webui
|
65 |
|
66 |
Please make sure you're using the latest version of text-generation-webui
|
67 |
|
@@ -144,7 +144,7 @@ It was created with group_size 128 to increase inference accuracy, but without -
|
|
144 |
|
145 |
* `orca-mini-13b-GPTQ-4bit-128g.no-act.order.safetensors`
|
146 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
147 |
-
* LLaMa models also work with [ExLlama](https://github.com/turboderp/exllama
|
148 |
* Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
|
149 |
* Works with text-generation-webui, including one-click-installers.
|
150 |
* Parameters: Groupsize = 128. Act Order / desc_act = False.
|
|
|
61 |
### Response:
|
62 |
```
|
63 |
|
64 |
+
## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
65 |
|
66 |
Please make sure you're using the latest version of text-generation-webui
|
67 |
|
|
|
144 |
|
145 |
* `orca-mini-13b-GPTQ-4bit-128g.no-act.order.safetensors`
|
146 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
147 |
+
* LLaMa models also work with [ExLlama](https://github.com/turboderp/exllama), which usually provides much higher performance, and uses less VRAM, than AutoGPTQ.
|
148 |
* Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
|
149 |
* Works with text-generation-webui, including one-click-installers.
|
150 |
* Parameters: Groupsize = 128. Act Order / desc_act = False.
|