CRD716
/

ggml-LLaMa-65B-quantized

Text Generation

text-generation-inference

Model card Files Files and versions Community

ggml-LLaMa-65B-quantized / README.md

CRD716's picture

revert

53cc98d over 1 year ago

|

958 Bytes

	---
	license: gpl-3.0
	metrics:
	- perplexity
	pipeline_tag: text-generation
	tags:
	- LLaMa
	- text-generation-inference
	- ggml
	language:
	- en
	- bg
	- ca
	- cs
	- da
	- de
	- es
	- fr
	- hr
	- hu
	- it
	- nl
	- pl
	- pt
	- ro
	- ru
	- sl
	- sr
	- sv
	- uk
	library_name: adapter-transformers
	---

	LLaMa 65B converted to ggml via LLaMa.cpp, then quantized to 4bit.

	Note: If you previously used the q4_0 model before April 26th, 2023, you are using an outdated model. I suggest redownloading for a better experience.
	Check https://github.com/ggerganov/llama.cpp#quantization for details on the different quantization types.

	I recommend the following settings when running as a good starting point: ```main.exe -m ggml-LLaMa-65B-q4_0.bin -n -1 -t 42 -c 2048 --temp 0.4 --interactive-first --repeat_penalty 1.2 --color```

	Be aware that LLaMa is a text generation model, not a conversational one, and as such you will have to prompt it differently than, for example, Vicuna or ChatGPT.