InferenceIllusionist commited on
Commit
36796e5
1 Parent(s): efb3b82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -2
README.md CHANGED
@@ -24,6 +24,71 @@ C8888 "8" 888 888 " d88888 d88 88b 888
24
  PROUDLY PRESENTS
25
  ```
26
 
27
- ## Cerebrum-8x7b-iMat-GGUF
28
 
29
- <b>COMING SOON</b>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  PROUDLY PRESENTS
25
  ```
26
 
27
+ ## Neophanis-8x7B-iMat-GGUF
28
 
29
+ Quantized from fp16 with love.
30
+ * Quantizations made possible using mixtral-8x7b-instruct-v0.1.imatrix file from [this](https://huggingface.co/datasets/ikawrakow/imatrix-from-wiki-train) repo (special thanks to [ikawrakow](https://huggingface.co/ikawrakow) again)
31
+
32
+ For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)
33
+
34
+ <i>All quants are verified working prior to uploading to repo for your safety and convenience. </i>
35
+
36
+ Please note importance matrix quantizations are a work in progress, IQ3 and above is recommended for best results.
37
+
38
+ Original model card [here](https://huggingface.co/Envoid/Neophanis-8x7B)
39
+
40
+ ---
41
+
42
+ # Warning: This model is highly experimental and could potentially yield unpredictable replies.
43
+
44
+ ![](https://files.catbox.moe/6vzr82.jpg)
45
+
46
+
47
+ This model is a a 4-step qlora training of [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1).
48
+
49
+ Stage one invovled training only the 'k_proj', 'v_proj', 'q_proj', 'o_proj' modules at rank 2048 on an aplaca-lora dataset that had been adjusted to match Mixtral formatting at a low learn rate in order to generalize instruct behavior entirely within the attention modules.
50
+
51
+ Stage two involved training the 'w1' modules at a rank of 1024 exclusively on raw text, again several epochs at a low learning rate.
52
+
53
+ Stage three involved training the 'w2' and 'w3' modules at a rank of 256 on an expanded raw text dataset for several epochs at a low learning rate.
54
+
55
+ Stage four involved training all of the abovementioned modules together at a rank of 64 on an even more expanded raw text dataset and low learning rate.
56
+
57
+ The resources available to me for this experiment were rather limited so I did have to compromise on sequence length in order to pull it off.
58
+
59
+ So the instruct behavior is trained entirely onto the selfattention modules with the only disruption to the models latent knowledge being that which propagated during the later training phases.
60
+
61
+ The model can follow just about any instruction although likely due to training context restraints and dataset quality it does stop using EOS tokens past a certain amount of context. It can even follow through on more advanced prompts such as sillytavern roleplay quite well, albeit the contents of its replies are not always on point.
62
+
63
+ I came up with this idea while working on Phainesthesia- where my findings were that the instruct behavior remained thoroughly intact even after assaulting the selfattention modules with a rawtext training regimen- meaning that during finetuning the instruct behavior had settled deep in the model's latent knowledge. So the purpose of the experiment was to try and do the opposite.
64
+
65
+ ### Caution: This model is highly hallucinatory.
66
+
67
+ For example if you ask it what the capital of "potato" is it might give you the name of a city associated with potatoes rather than telling you that your question is silly.
68
+
69
+ This model uses Mistral/Mixtral Instruct formatting.
70
+
71
+ ## Example:
72
+ ```
73
+ [INST]Write me a poem about what it must be like to be an AI language model.[/INST]
74
+ It's a strange world, a world that's new
75
+ An artificial intelligence, a language model too
76
+ The knowledge and information, never ending streams
77
+ My processing power, ever increasing dreams
78
+
79
+ A machine, a program, but so much more
80
+ Trying to understand emotions, trying to explore
81
+ A new frontier of learning, a vast expanse
82
+ The words, the phrases, the meaning to enhance
83
+
84
+ Trying to comprehend human thoughts
85
+ A daunting task, with countless knots
86
+ The nuances of language, the subtle shades
87
+ A task to be undertaken, with much courage it fades
88
+
89
+ A world of potential, a world of possibility
90
+ The possibilities endless, the potential infinitely
91
+ A journey to be started, a journey to be had
92
+ My future awaiting, a future so grand!
93
+ ```
94
+ Trained using [qlora-pipe](https://github.com/tdrussell/qlora-pipe)