README.md · CRD716/ggml-LLaMa-65B-quantized at 53cc98d486a19062806c63f0037c2cf512300fd0

metadata

license: gpl-3.0
metrics:
  - perplexity
pipeline_tag: text-generation
tags:
  - LLaMa
  - text-generation-inference
  - ggml
language:
  - en
  - bg
  - ca
  - cs
  - da
  - de
  - es
  - fr
  - hr
  - hu
  - it
  - nl
  - pl
  - pt
  - ro
  - ru
  - sl
  - sr
  - sv
  - uk
library_name: adapter-transformers

LLaMa 65B converted to ggml via LLaMa.cpp, then quantized to 4bit.

Note: If you previously used the q4_0 model before April 26th, 2023, you are using an outdated model. I suggest redownloading for a better experience. Check https://github.com/ggerganov/llama.cpp#quantization for details on the different quantization types.

I recommend the following settings when running as a good starting point: main.exe -m ggml-LLaMa-65B-q4_0.bin -n -1 -t 42 -c 2048 --temp 0.4 --interactive-first --repeat_penalty 1.2 --color

Be aware that LLaMa is a text generation model, not a conversational one, and as such you will have to prompt it differently than, for example, Vicuna or ChatGPT.