Update README.md
Browse files
README.md
CHANGED
@@ -3,6 +3,19 @@ license: other
|
|
3 |
language:
|
4 |
- en
|
5 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
An improved, potentially even perfected variant of MythoMix, my [MythoLogic-L2](https://huggingface.co/Gryphe/MythoLogic-L2-13b) and [Huginn](https://huggingface.co/The-Face-Of-Goonery/Huginn-13b-FP16) merge using a highly experimental tensor type merge technique. The main difference with MythoMix is that I allowed more of Huginn to intermingle with the single tensors located at the front and end of a model, resulting in increased coherency across the entire structure.
|
7 |
|
8 |
The script and the acccompanying templates I used to produce both can [be found here](https://github.com/Gryphe/BlockMerge_Gradient/tree/main/YAML).
|
|
|
3 |
language:
|
4 |
- en
|
5 |
---
|
6 |
+
|
7 |
+
This is an exllama V2 quantization of https://huggingface.co/Gryphe/MythoMax-L2-13b
|
8 |
+
This particular version is designed for maximum quality at the cost of size.
|
9 |
+
|
10 |
+
I noticed that the previous 8bpw version was using a small bitrate for some layers, and reported a lower quantized ppl than its base ppl, implying that the layer optimizer was overfitting to the dataset.
|
11 |
+
In response, I edited measurement.json to add +1 error to all bitrates except for 8.13 (the max).
|
12 |
+
(Don't reuse that file for other quants!!)
|
13 |
+
|
14 |
+
That means this version uses the best 8bit-32g quantization mode for all layers. In out of sample tests, this squeezes out just a little better perplexity than the 8bit version.
|
15 |
+
|
16 |
+
Calibration data: https://huggingface.co/datasets/wikitext/resolve/refs%2Fconvert%2Fparquet/wikitext-2-v1/test/0000.parquet
|
17 |
+
|
18 |
+
|
19 |
An improved, potentially even perfected variant of MythoMix, my [MythoLogic-L2](https://huggingface.co/Gryphe/MythoLogic-L2-13b) and [Huginn](https://huggingface.co/The-Face-Of-Goonery/Huginn-13b-FP16) merge using a highly experimental tensor type merge technique. The main difference with MythoMix is that I allowed more of Huginn to intermingle with the single tensors located at the front and end of a model, resulting in increased coherency across the entire structure.
|
20 |
|
21 |
The script and the acccompanying templates I used to produce both can [be found here](https://github.com/Gryphe/BlockMerge_Gradient/tree/main/YAML).
|