R136a1
/

MXLewd-L2-20B-exl2

Text Generation

Not-For-All-Audiences

nsfw

Inference Endpoints

Model card Files Files and versions Community

MXLewd-L2-20B-exl2 / README.md

R136a1's picture

Update README.md

64782fb about 1 year ago

|

history blame contribute delete

888 Bytes

	---
	tags:
	- not-for-all-audiences
	- nsfw
	license: other
	language:
	- en
	---

	[EXL2](https://github.com/turboderp/exllamav2/tree/master#exllamav2) Quantization of [Undi95's's MXLewd-L2-20B](https://huggingface.co/Undi95/MXLewd-L2-20B).


	## Model details

	First attempt to quantize a 20B model so it can run on 16GB VRAM with the highest quality possible.
	Quantized at 3.18bpw with hb 6. 8.13bpw also available for those who want it (exl2 is very fast with flash-attention and the quality is (almost) the same with fp16.)

	Perplexity:

	Base = 6.4744

	8bpw h8 = 6.4471

	3.18 h6 = 6.5705

	Dataset = [wikitext](https://huggingface.co/datasets/wikitext/resolve/refs%2Fconvert%2Fparquet/wikitext-2-v1/test/0000.parquet)

	## Prompt Format

	```
	Below is an instruction that describes a task. Write a response that appropriately completes the request.

	### Instruction:
	{prompt}

	### Response:

	```