|
--- |
|
license: gpl-3.0 |
|
metrics: |
|
- perplexity |
|
pipeline_tag: text-generation |
|
tags: |
|
- LLaMa |
|
- text-generation-inference |
|
- ggml |
|
language: |
|
- en |
|
- bg |
|
- ca |
|
- cs |
|
- da |
|
- de |
|
- es |
|
- fr |
|
- hr |
|
- hu |
|
- it |
|
- nl |
|
- pl |
|
- pt |
|
- ro |
|
- ru |
|
- sl |
|
- sr |
|
- sv |
|
- uk |
|
library_name: adapter-transformers |
|
--- |
|
|
|
LLaMa 65B converted to ggml via LLaMa.cpp, then quantized to 4bit. |
|
|
|
Note: If you previously used the q4_0 model before April 26th, 2023, you are using an outdated model. I suggest redownloading for a better experience. |
|
Check https://github.com/ggerganov/llama.cpp#quantization for details on the different quantization types. |
|
|
|
I recommend the following settings when running as a good starting point: ```main.exe -m ggml-LLaMa-65B-q4_0.bin -n -1 -t 42 -c 2048 --temp 0.4 --interactive-first --repeat_penalty 1.2 --color``` |
|
|
|
Be aware that LLaMa is a text generation model, not a conversational one, and as such you will have to prompt it differently than, for example, Vicuna or ChatGPT. |