TheBloke
/

stable-vicuna-13B-GGML

Model card Files Files and versions Community

TheBloke commited on Apr 28, 2023

Commit

5220cd8

•

1 Parent(s): 5f7f138

Update README.md

Files changed (1) hide show

README.md +17 -3

README.md CHANGED Viewed

@@ -14,6 +14,15 @@ It is the result of merging the deltas from the above repository with the origin
 * [4bit and 5bit GGML models for CPU inference](https://huggingface.co/TheBloke/stable-vicuna-13B-GGML).
 * [Unquantised 16bit model in HF format](https://huggingface.co/TheBloke/stable-vicuna-13B-HF).
 ## Provided files
 | Name | Quant method | Bits | Size | RAM required | Use case |
 | ---- | ---- | ---- | ---- | ---- | ----- |
@@ -50,15 +59,20 @@ Don't expect any third-party UIs/tools to support them yet.
 I use the following command line; adjust for your tastes and needs:
 ```
-./main -t 18 -m stable-vicuna-13B.ggml.q4_2.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Human: Write a story about llamas
-### Assistant:"
 ```
 Change `-t 18` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
-If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
 ## How to run in `text-generation-webui`
 Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
 Note: at this time text-generation-webui will not support the new q5 quantisation methods.

 * [4bit and 5bit GGML models for CPU inference](https://huggingface.co/TheBloke/stable-vicuna-13B-GGML).
 * [Unquantised 16bit model in HF format](https://huggingface.co/TheBloke/stable-vicuna-13B-HF).
+## PROMPT TEMPLATE
+This model works best with the following prompt template:
+```
+### Human: your prompt here
+### Assistant:
+```
 ## Provided files
 | Name | Quant method | Bits | Size | RAM required | Use case |
 | ---- | ---- | ---- | ---- | ---- | ----- |
 I use the following command line; adjust for your tastes and needs:
 ```
+./main -t 18 -m stable-vicuna-13B.ggml.q4_2.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -r "### Human:" -i
 ```
 Change `-t 18` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
+If you want to enter a prompt from the command line, use `-p <PROMPT>` like so:
+```
+./main -t 18 -m stable-vicuna-13B.ggml.q4_2.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -r "### Human:" -p "### Human: write a story about llamas ### Assistant:"
+```
 ## How to run in `text-generation-webui`
+GGML models can be loaded into text-generation-webui by installing the llama.cpp module, then placing the ggml model file in a model folder as usual.
 Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
 Note: at this time text-generation-webui will not support the new q5 quantisation methods.