Update README.md
Browse files
README.md
CHANGED
@@ -41,7 +41,13 @@ Currently these files will also not work with code that previously supported Fal
|
|
41 |
|
42 |
* [2, 3, 4, 5, 6, 8-bit GGCT models for CPU+GPU inference](https://huggingface.co/TheBloke/falcon-40b-sft-mix-1226-GGML)
|
43 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/OpenAssistant/falcon-40b-sft-mix-1226)
|
44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
<!-- compatibility_ggml start -->
|
46 |
## Compatibility
|
47 |
|
@@ -57,7 +63,7 @@ Compiling on Windows: developer cmp-nct notes: 'I personally compile it using VS
|
|
57 |
|
58 |
Once compiled you can then use `bin/falcon_main` just like you would use llama.cpp. For example:
|
59 |
```
|
60 |
-
bin/falcon_main -t 8 -ngl 100 -b 1 -m falcon-40b-sft-mix-1226.ggccv1.q4_K.bin -p "
|
61 |
```
|
62 |
|
63 |
You can specify `-ngl 100` regardles of your VRAM, as it will automatically detect how much VRAM is available to be used.
|
|
|
41 |
|
42 |
* [2, 3, 4, 5, 6, 8-bit GGCT models for CPU+GPU inference](https://huggingface.co/TheBloke/falcon-40b-sft-mix-1226-GGML)
|
43 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/OpenAssistant/falcon-40b-sft-mix-1226)
|
44 |
+
|
45 |
+
## Prompt template
|
46 |
+
|
47 |
+
```
|
48 |
+
<|prompter|>prompt<|endoftext|><|assistant|>
|
49 |
+
```
|
50 |
+
|
51 |
<!-- compatibility_ggml start -->
|
52 |
## Compatibility
|
53 |
|
|
|
63 |
|
64 |
Once compiled you can then use `bin/falcon_main` just like you would use llama.cpp. For example:
|
65 |
```
|
66 |
+
bin/falcon_main -t 8 -ngl 100 -b 1 -m falcon-40b-sft-mix-1226.ggccv1.q4_K.bin -p "<|prompter|>write a story about llamas<|endoftext|><|assistant|>"
|
67 |
```
|
68 |
|
69 |
You can specify `-ngl 100` regardles of your VRAM, as it will automatically detect how much VRAM is available to be used.
|