CRD716 commited on
Commit
5d784be
·
1 Parent(s): 31eff08

Update README.md

Browse files

New quantizations s00n tm

Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -2,7 +2,7 @@
2
  license: gpl-3.0
3
  metrics:
4
  - perplexity
5
- pipeline_tag: conversational
6
  tags:
7
  - LLaMa
8
  - text-generation-inference
@@ -11,4 +11,8 @@ tags:
11
 
12
  LLaMa 65B converted to ggml via LLaMa.cpp, then quantized to 4bit.
13
 
14
- I recommend the following settings when running as a good starting point: ```main.exe -m ggml-LLaMa-65B-q4_0.bin -n -1 -t 42 -c 2048 --temp 0.35 --interactive-first --repeat_penalty 1.2 --instruct --color```
 
 
 
 
 
2
  license: gpl-3.0
3
  metrics:
4
  - perplexity
5
+ pipeline_tag: text-generation
6
  tags:
7
  - LLaMa
8
  - text-generation-inference
 
11
 
12
  LLaMa 65B converted to ggml via LLaMa.cpp, then quantized to 4bit.
13
 
14
+ Note: If you previously used the q4_0 model before April 26th, 2023, you are using an outdated model. I suggest redownloading for a better experience.
15
+ Check https://github.com/ggerganov/llama.cpp#quantization for details on the different quantization types.
16
+
17
+ I recommend the following settings when running as a good starting point: ```main.exe -m ggml-LLaMa-65B-q4_0.bin -n -1 -t 42 -c 2048 --temp 0.4 --interactive-first --repeat_penalty 1.2 --color```
18
+ Be aware that LLaMa is a text generation model, not a conversational one, and as such you will have to prompt it differently than, for example, Vicuna or ChatGPT.