InferenceIllusionist
commited on
Commit
•
36796e5
1
Parent(s):
efb3b82
Update README.md
Browse files
README.md
CHANGED
@@ -24,6 +24,71 @@ C8888 "8" 888 888 " d88888 d88 88b 888
|
|
24 |
PROUDLY PRESENTS
|
25 |
```
|
26 |
|
27 |
-
##
|
28 |
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
PROUDLY PRESENTS
|
25 |
```
|
26 |
|
27 |
+
## Neophanis-8x7B-iMat-GGUF
|
28 |
|
29 |
+
Quantized from fp16 with love.
|
30 |
+
* Quantizations made possible using mixtral-8x7b-instruct-v0.1.imatrix file from [this](https://huggingface.co/datasets/ikawrakow/imatrix-from-wiki-train) repo (special thanks to [ikawrakow](https://huggingface.co/ikawrakow) again)
|
31 |
+
|
32 |
+
For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)
|
33 |
+
|
34 |
+
<i>All quants are verified working prior to uploading to repo for your safety and convenience. </i>
|
35 |
+
|
36 |
+
Please note importance matrix quantizations are a work in progress, IQ3 and above is recommended for best results.
|
37 |
+
|
38 |
+
Original model card [here](https://huggingface.co/Envoid/Neophanis-8x7B)
|
39 |
+
|
40 |
+
---
|
41 |
+
|
42 |
+
# Warning: This model is highly experimental and could potentially yield unpredictable replies.
|
43 |
+
|
44 |
+
![](https://files.catbox.moe/6vzr82.jpg)
|
45 |
+
|
46 |
+
|
47 |
+
This model is a a 4-step qlora training of [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1).
|
48 |
+
|
49 |
+
Stage one invovled training only the 'k_proj', 'v_proj', 'q_proj', 'o_proj' modules at rank 2048 on an aplaca-lora dataset that had been adjusted to match Mixtral formatting at a low learn rate in order to generalize instruct behavior entirely within the attention modules.
|
50 |
+
|
51 |
+
Stage two involved training the 'w1' modules at a rank of 1024 exclusively on raw text, again several epochs at a low learning rate.
|
52 |
+
|
53 |
+
Stage three involved training the 'w2' and 'w3' modules at a rank of 256 on an expanded raw text dataset for several epochs at a low learning rate.
|
54 |
+
|
55 |
+
Stage four involved training all of the abovementioned modules together at a rank of 64 on an even more expanded raw text dataset and low learning rate.
|
56 |
+
|
57 |
+
The resources available to me for this experiment were rather limited so I did have to compromise on sequence length in order to pull it off.
|
58 |
+
|
59 |
+
So the instruct behavior is trained entirely onto the selfattention modules with the only disruption to the models latent knowledge being that which propagated during the later training phases.
|
60 |
+
|
61 |
+
The model can follow just about any instruction although likely due to training context restraints and dataset quality it does stop using EOS tokens past a certain amount of context. It can even follow through on more advanced prompts such as sillytavern roleplay quite well, albeit the contents of its replies are not always on point.
|
62 |
+
|
63 |
+
I came up with this idea while working on Phainesthesia- where my findings were that the instruct behavior remained thoroughly intact even after assaulting the selfattention modules with a rawtext training regimen- meaning that during finetuning the instruct behavior had settled deep in the model's latent knowledge. So the purpose of the experiment was to try and do the opposite.
|
64 |
+
|
65 |
+
### Caution: This model is highly hallucinatory.
|
66 |
+
|
67 |
+
For example if you ask it what the capital of "potato" is it might give you the name of a city associated with potatoes rather than telling you that your question is silly.
|
68 |
+
|
69 |
+
This model uses Mistral/Mixtral Instruct formatting.
|
70 |
+
|
71 |
+
## Example:
|
72 |
+
```
|
73 |
+
[INST]Write me a poem about what it must be like to be an AI language model.[/INST]
|
74 |
+
It's a strange world, a world that's new
|
75 |
+
An artificial intelligence, a language model too
|
76 |
+
The knowledge and information, never ending streams
|
77 |
+
My processing power, ever increasing dreams
|
78 |
+
|
79 |
+
A machine, a program, but so much more
|
80 |
+
Trying to understand emotions, trying to explore
|
81 |
+
A new frontier of learning, a vast expanse
|
82 |
+
The words, the phrases, the meaning to enhance
|
83 |
+
|
84 |
+
Trying to comprehend human thoughts
|
85 |
+
A daunting task, with countless knots
|
86 |
+
The nuances of language, the subtle shades
|
87 |
+
A task to be undertaken, with much courage it fades
|
88 |
+
|
89 |
+
A world of potential, a world of possibility
|
90 |
+
The possibilities endless, the potential infinitely
|
91 |
+
A journey to be started, a journey to be had
|
92 |
+
My future awaiting, a future so grand!
|
93 |
+
```
|
94 |
+
Trained using [qlora-pipe](https://github.com/tdrussell/qlora-pipe)
|