Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
4-bit precision
gptq
TheBloke commited on
Commit
06cf46d
1 Parent(s): 1ef5236

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -61,7 +61,7 @@ input
61
  ### Response:
62
  ```
63
 
64
- ## How to easily download and use this model in text-generation-webui
65
 
66
  Please make sure you're using the latest version of text-generation-webui
67
 
@@ -144,7 +144,7 @@ It was created with group_size 128 to increase inference accuracy, but without -
144
 
145
  * `orca-mini-13b-GPTQ-4bit-128g.no-act.order.safetensors`
146
  * Works with AutoGPTQ in CUDA or Triton modes.
147
- * LLaMa models also work with [ExLlama](https://github.com/turboderp/exllama}, which usually provides much higher performance, and uses less VRAM, than AutoGPTQ.
148
  * Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
149
  * Works with text-generation-webui, including one-click-installers.
150
  * Parameters: Groupsize = 128. Act Order / desc_act = False.
 
61
  ### Response:
62
  ```
63
 
64
+ ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
65
 
66
  Please make sure you're using the latest version of text-generation-webui
67
 
 
144
 
145
  * `orca-mini-13b-GPTQ-4bit-128g.no-act.order.safetensors`
146
  * Works with AutoGPTQ in CUDA or Triton modes.
147
+ * LLaMa models also work with [ExLlama](https://github.com/turboderp/exllama), which usually provides much higher performance, and uses less VRAM, than AutoGPTQ.
148
  * Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
149
  * Works with text-generation-webui, including one-click-installers.
150
  * Parameters: Groupsize = 128. Act Order / desc_act = False.