TheBloke
/

gpt4-x-vicuna-13B-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on May 5, 2023

Commit

85afc56

•

1 Parent(s): b926638

Update README.md

Files changed (1) hide show

README.md +48 -0

README.md CHANGED Viewed

@@ -1,6 +1,54 @@
 ---
 license: gpl
 ---
 As a base model used https://huggingface.co/eachadea/vicuna-13b-1.1
 Finetuned on Teknium's GPTeacher dataset, unreleased Roleplay v2 dataset, GPT-4-LLM dataset, and Nous Research Instruct Dataset

 ---
 license: gpl
 ---
+# gpt4-x-vicuna-13B-GPTQ
+This repo contains 4bit GPTQ format quantised models of [NousResearch's gpt4-x-vicuna-13b](https://huggingface.co/NousResearch/gpt4-x-vicuna-13b).
+It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
+## Repositories available
+* [4bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GPTQ).
+## How to easily download and use this model in text-generation-webui
+Open the text-generation-webui UI as normal.
+1. Click the **Model tab**.
+2. Under **Download custom model or LoRA**, enter `TheBloke/gpt4-x-vicuna-13B-GPTQ`.
+3. Click **Download**.
+4. Wait until it says it's finished downloading.
+5. Click the **Refresh** icon next to **Model** in the top left.
+6. In the **Model drop-down**: choose the model you just downloaded, `gpt4-x-vicuna-13B-GPTQ`.
+7. If you see an error in the bottom right, ignore it - it's temporary.
+8. Fill out the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = 128`, `model_type = Llama`
+9. Click **Save settings for this model** in the top right.
+10. Click **Reload the Model** in the top right.
+11. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
+## Provided files
+**Compatible file - GPT4-x-Vicuna-13B-GPTQ-4bit-128g.compat.act-order.safetensors**
+In the `main` branch - the default one - you will find `GPT4-x-Vicuna-13B-GPTQ-4bit-128g.compat.act-order.safetensors`
+This will work with all versions of GPTQ-for-LLaMa. It has maximum compatibility
+It was created without the `--act-order` parameter. It may have slightly lower inference quality compared to the other file, but is guaranteed to work on all versions of GPTQ-for-LLaMa and text-generation-webui.
+* `GPT4-x-Vicuna-13B-GPTQ-4bit-128g.compat.act-order.safetensors`
+  * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
+  * Works with text-generation-webui one-click-installers
+  * Parameters: Groupsize = 128g. No act-order.
+  * Command used to create the GPTQ:
+    ```
+    CUDA_VISIBLE_DEVICES=0 python3 llama.py GPT4All-13B-snoozy c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors GPT4-x-Vicuna-13B-GPTQ-4bit-128g.compat.act-order.safetensors
+    ```
+# Original model card
 As a base model used https://huggingface.co/eachadea/vicuna-13b-1.1
 Finetuned on Teknium's GPTeacher dataset, unreleased Roleplay v2 dataset, GPT-4-LLM dataset, and Nous Research Instruct Dataset