R136a1
/

MythoMax-L2-13B-exl2

Text Generation

Transformers

English

llama

Inference Endpoints

Model card Files Files and versions Community

R136a1 commited on Sep 26, 2023

Commit

36d8bbd

•

1 Parent(s): ed66c38

Update README.md

Browse files

Files changed (1) hide show

README.md +12 -8

README.md CHANGED Viewed

@@ -1,3 +1,9 @@
 [EXL2](https://github.com/turboderp/exllamav2/tree/master#exllamav2) Quantization of [Gryphe's MythoMax L2 13B](https://huggingface.co/Gryphe/MythoMax-L2-13b).
 Other quantized models are available from TheBloke: [GGML](https://huggingface.co/TheBloke/MythoMax-L2-13B-GGML) - [GPTQ](https://huggingface.co/TheBloke/MythoMax-L2-13B-GPTQ) - [GGUF](https://huggingface.co/TheBloke/MythoMax-L2-13B-GGUF) - [AWQ](https://huggingface.co/TheBloke/MythoMax-L2-13B-AWQ)
@@ -8,14 +14,12 @@ Other quantized models are available from TheBloke: [GGML](https://huggingface.c
 ## Model details
-| Branch                                                               | Bits | Perplexity | Desc                                                    |
-|----------------------------------------------------------------------|------|------------|---------------------------------------------------------|
-| [main](https://huggingface.co/R136a1/MythoMax-L2-13B-exl2/tree/main) | 5    | 6.1018     | Up to 6144 context size on T4 GPU                       |
-| [6bit](https://huggingface.co/R136a1/MythoMax-L2-13B-exl2/tree/6bit) | 6    | 6.1182     | 4096 context size (tokens) on T4 GPU                    |
-| [3bit](https://huggingface.co/R136a1/MythoMax-L2-13B-exl2/tree/3bit) | 3    | 6.3666     | Low bits quant while still good                         |
-| [4bit](https://huggingface.co/R136a1/MythoMax-L2-13B-exl2/tree/4bit) | 4    | 6.1601     | Slightly better than 4bit GPTQ, ez 8K context on T4 GPU |
-| -                                                                    | 7    | 6.1056     | 2048 max context size for T4 GPU                        |
-| -                                                                    | 8    | 6.1027     | Just, why?                                              |
 I'll upload the 7 and 8 bits quant if someone request it. (Idk y the 5 bits quant preplexity is lower than higher bits quant, I think I did something wrong?)

+---
+license: other
+language:
+- en
+---
 [EXL2](https://github.com/turboderp/exllamav2/tree/master#exllamav2) Quantization of [Gryphe's MythoMax L2 13B](https://huggingface.co/Gryphe/MythoMax-L2-13b).
 Other quantized models are available from TheBloke: [GGML](https://huggingface.co/TheBloke/MythoMax-L2-13B-GGML) - [GPTQ](https://huggingface.co/TheBloke/MythoMax-L2-13B-GPTQ) - [GGUF](https://huggingface.co/TheBloke/MythoMax-L2-13B-GGUF) - [AWQ](https://huggingface.co/TheBloke/MythoMax-L2-13B-AWQ)
 ## Model details
+| **Branch**                                                           | **bits** | **Perplexity** | **Description**                                             |
+|----------------------------------------------------------------------|----------|----------------|-------------------------------------------------------------|
+| [3bit](https://huggingface.co/R136a1/MythoMax-L2-13B-exl2/tree/3bit) | 3.73     | 5.8251         | Low bits quant while still good                             |
+| [4bit](https://huggingface.co/R136a1/MythoMax-L2-13B-exl2/tree/4bit) | 4.33     | 5.7784         | can go 6K context on T4 GPU                                 |
+| [main](https://huggingface.co/R136a1/MythoMax-L2-13B-exl2/tree/main) | 5.33     | 5.7427         | 4k Context on T4 GPU (recommended if you use Google Colab)  |
+| [6bit](https://huggingface.co/R136a1/MythoMax-L2-13B-exl2/tree/6bit) | 6.13     | 5.7347         | For those who want better quality and capable of running it |
 I'll upload the 7 and 8 bits quant if someone request it. (Idk y the 5 bits quant preplexity is lower than higher bits quant, I think I did something wrong?)