--- license: gpl-3.0 metrics: - perplexity pipeline_tag: text-generation tags: - LLaMa - text-generation-inference - ggml language: - en - bg - ca - cs - da - de - es - fr - hr - hu - it - nl - pl - pt - ro - ru - sl - sr - sv - uk library_name: adapter-transformers --- LLaMa 65B converted to ggml via LLaMa.cpp, then quantized to 4bit. Note: If you previously used the q4_0 model before April 26th, 2023, you are using an outdated model. I suggest redownloading for a better experience. Check https://github.com/ggerganov/llama.cpp#quantization for details on the different quantization types. I recommend the following settings when running as a good starting point: ```main.exe -m ggml-LLaMa-65B-q4_0.bin -n -1 -t 42 -c 2048 --temp 0.4 --interactive-first --repeat_penalty 1.2 --color``` Be aware that LLaMa is a text generation model, not a conversational one, and as such you will have to prompt it differently than, for example, Vicuna or ChatGPT.