R136a1
/

MythoMax-L2-13B-exl2

Text Generation

Inference Endpoints

Model card Files Files and versions Community

MythoMax-L2-13B-exl2 / README.md

R136a1's picture

Update README.md

36d8bbd about 1 year ago

|

1.74 kB

	---
	license: other
	language:
	- en
	---

	[EXL2](https://github.com/turboderp/exllamav2/tree/master#exllamav2) Quantization of [Gryphe's MythoMax L2 13B](https://huggingface.co/Gryphe/MythoMax-L2-13b).

	Other quantized models are available from TheBloke: [GGML](https://huggingface.co/TheBloke/MythoMax-L2-13B-GGML) - [GPTQ](https://huggingface.co/TheBloke/MythoMax-L2-13B-GPTQ) - [GGUF](https://huggingface.co/TheBloke/MythoMax-L2-13B-GGUF) - [AWQ](https://huggingface.co/TheBloke/MythoMax-L2-13B-AWQ)





	## Model details

	\| Branch \| bits \| Perplexity \| Description \|
	\|----------------------------------------------------------------------\|----------\|----------------\|-------------------------------------------------------------\|
	\| [3bit](https://huggingface.co/R136a1/MythoMax-L2-13B-exl2/tree/3bit) \| 3.73 \| 5.8251 \| Low bits quant while still good \|
	\| [4bit](https://huggingface.co/R136a1/MythoMax-L2-13B-exl2/tree/4bit) \| 4.33 \| 5.7784 \| can go 6K context on T4 GPU \|
	\| [main](https://huggingface.co/R136a1/MythoMax-L2-13B-exl2/tree/main) \| 5.33 \| 5.7427 \| 4k Context on T4 GPU (recommended if you use Google Colab) \|
	\| [6bit](https://huggingface.co/R136a1/MythoMax-L2-13B-exl2/tree/6bit) \| 6.13 \| 5.7347 \| For those who want better quality and capable of running it \|

	I'll upload the 7 and 8 bits quant if someone request it. (Idk y the 5 bits quant preplexity is lower than higher bits quant, I think I did something wrong?)

	## Prompt Format

	Alpaca format:
	```
	### Instruction:





	### Response:
	```