R136a1
/

MM-ReMM-L2-20B-exl2

Text Generation

Inference Endpoints

Model card Files Files and versions Community

MM-ReMM-L2-20B-exl2 / README.md

R136a1's picture

Update README.md

b5e78cf about 1 year ago

|

694 Bytes

	---
	license: other
	language:
	- en
	---

	[EXL2](https://github.com/turboderp/exllamav2/tree/master#exllamav2) Quantization of [Undi95's's MM-ReMM-L2-20B](https://huggingface.co/Undi95/MM-ReMM-L2-20B).


	## Model details

	Quantized at 3.18bpw with hb 6, This one can actually go full 4K context on 16GB VRAM, will redo the other 20b models later.

	Perplexity:

	Base = 6.9504

	3.18 h6 = 7.0138

	Dataset = [wikitext](https://huggingface.co/datasets/wikitext/resolve/refs%2Fconvert%2Fparquet/wikitext-2-v1/test/0000.parquet)

	## Prompt Format

	```
	Below is an instruction that describes a task. Write a response that appropriately completes the request.

	### Instruction:
	{prompt}

	### Response:

	```