Nikolai1902
/

LL2-13B-FIMFiction-QLORA-GGML

Model card Files Files and versions Community

LL2-13B-FIMFiction-QLORA-GGML / README.md

Nikolai1902's picture

Update README.md

6040c6b over 1 year ago

|

823 Bytes

	---
	license: wtfpl
	---
	This model was a QLoRA of LLaMA 2-13B base finetuned on the FIMFiction archive and then merged with the base model (as most GGML loading apps don't support LoRAs), and quantized for llama.cpp-based frontends. Was trained with 1024 context length


	There are two options, depending on the resources you have:
	- Q5_K_M: Low quality loss K-quantized 5 bits model. Max RAM consumption is 11.73 GB, recommended if you have 12GB of VRAM to load 40 layers
	- Q4_K_S: Compact K-quantized 4 bits. Max RAM consumption is 9.87 GB

	This not an instruction tuned model, it was trained on raw text, so treat it like an autocomplete.
	Seems sensitive to formatting: I found it's usually better at staying on topic when using double spacing in the prompt.

	Only tags excluded from the dataset are eqg and humanized.