File size: 958 Bytes
554962d f962dcd 53cc98d f962dcd 63b00c0 554962d fb390d9 3f3c6ab 5d784be 8ddbe3c 53cc98d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
---
license: gpl-3.0
metrics:
- perplexity
pipeline_tag: text-generation
tags:
- LLaMa
- text-generation-inference
- ggml
language:
- en
- bg
- ca
- cs
- da
- de
- es
- fr
- hr
- hu
- it
- nl
- pl
- pt
- ro
- ru
- sl
- sr
- sv
- uk
library_name: adapter-transformers
---
LLaMa 65B converted to ggml via LLaMa.cpp, then quantized to 4bit.
Note: If you previously used the q4_0 model before April 26th, 2023, you are using an outdated model. I suggest redownloading for a better experience.
Check https://github.com/ggerganov/llama.cpp#quantization for details on the different quantization types.
I recommend the following settings when running as a good starting point: ```main.exe -m ggml-LLaMa-65B-q4_0.bin -n -1 -t 42 -c 2048 --temp 0.4 --interactive-first --repeat_penalty 1.2 --color```
Be aware that LLaMa is a text generation model, not a conversational one, and as such you will have to prompt it differently than, for example, Vicuna or ChatGPT. |