Update README.md
Browse files
README.md
CHANGED
@@ -14,6 +14,15 @@ It is the result of merging the deltas from the above repository with the origin
|
|
14 |
* [4bit and 5bit GGML models for CPU inference](https://huggingface.co/TheBloke/stable-vicuna-13B-GGML).
|
15 |
* [Unquantised 16bit model in HF format](https://huggingface.co/TheBloke/stable-vicuna-13B-HF).
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
## Provided files
|
18 |
| Name | Quant method | Bits | Size | RAM required | Use case |
|
19 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
@@ -50,15 +59,20 @@ Don't expect any third-party UIs/tools to support them yet.
|
|
50 |
I use the following command line; adjust for your tastes and needs:
|
51 |
|
52 |
```
|
53 |
-
./main -t 18 -m stable-vicuna-13B.ggml.q4_2.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -
|
54 |
-
### Assistant:"
|
55 |
```
|
56 |
Change `-t 18` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
|
57 |
|
58 |
-
If you want to
|
|
|
|
|
|
|
|
|
59 |
|
60 |
## How to run in `text-generation-webui`
|
61 |
|
|
|
|
|
62 |
Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
|
63 |
|
64 |
Note: at this time text-generation-webui will not support the new q5 quantisation methods.
|
|
|
14 |
* [4bit and 5bit GGML models for CPU inference](https://huggingface.co/TheBloke/stable-vicuna-13B-GGML).
|
15 |
* [Unquantised 16bit model in HF format](https://huggingface.co/TheBloke/stable-vicuna-13B-HF).
|
16 |
|
17 |
+
## PROMPT TEMPLATE
|
18 |
+
|
19 |
+
This model works best with the following prompt template:
|
20 |
+
|
21 |
+
```
|
22 |
+
### Human: your prompt here
|
23 |
+
### Assistant:
|
24 |
+
```
|
25 |
+
|
26 |
## Provided files
|
27 |
| Name | Quant method | Bits | Size | RAM required | Use case |
|
28 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
|
|
59 |
I use the following command line; adjust for your tastes and needs:
|
60 |
|
61 |
```
|
62 |
+
./main -t 18 -m stable-vicuna-13B.ggml.q4_2.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -r "### Human:" -i
|
|
|
63 |
```
|
64 |
Change `-t 18` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
|
65 |
|
66 |
+
If you want to enter a prompt from the command line, use `-p <PROMPT>` like so:
|
67 |
+
|
68 |
+
```
|
69 |
+
./main -t 18 -m stable-vicuna-13B.ggml.q4_2.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -r "### Human:" -p "### Human: write a story about llamas ### Assistant:"
|
70 |
+
```
|
71 |
|
72 |
## How to run in `text-generation-webui`
|
73 |
|
74 |
+
GGML models can be loaded into text-generation-webui by installing the llama.cpp module, then placing the ggml model file in a model folder as usual.
|
75 |
+
|
76 |
Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
|
77 |
|
78 |
Note: at this time text-generation-webui will not support the new q5 quantisation methods.
|