Update README.md
Browse files
README.md
CHANGED
@@ -40,7 +40,7 @@ If all our tokens are sent to just a few popular experts, that will make trainin
|
|
40 |
The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously. For now, frankenMoE remains psychotic. Raiden does improve upon the base heegyu/WizardVicuna-Uncensored-3B-0719, though.
|
41 |
|
42 |
## "Are there at least any datasets or plans for this model, in any way?"
|
43 |
-
There are many datasets included as a result of merging four models...for one, Silicon Maid is a merge of xDan which is trained on the [OpenOrca Dataset](https://huggingface.co/datasets/Open-Orca/OpenOrca) and the [OpenOrca DPO pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs). Loyal-Macaroni-Maid uses OpenChat-3.5, Starling and NeuralChat which has so many datasets I'm not going to list them all here. Dolphin 2.6 Mistral also has a large variety of datasets. Panda-7B-v0.1 was fine tuned
|
44 |
|
45 |
# Results
|
46 |
## Some results from the model's performance.
|
|
|
40 |
The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously. For now, frankenMoE remains psychotic. Raiden does improve upon the base heegyu/WizardVicuna-Uncensored-3B-0719, though.
|
41 |
|
42 |
## "Are there at least any datasets or plans for this model, in any way?"
|
43 |
+
There are many datasets included as a result of merging four models...for one, Silicon Maid is a merge of xDan which is trained on the [OpenOrca Dataset](https://huggingface.co/datasets/Open-Orca/OpenOrca) and the [OpenOrca DPO pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs). Loyal-Macaroni-Maid uses OpenChat-3.5, Starling and NeuralChat which has so many datasets I'm not going to list them all here. Dolphin 2.6 Mistral also has a large variety of datasets. Panda-7B-v0.1 was fine tuned by the person collaborating on this project with me using a base mistral and a private dataset. Panda gives the model the creativity it has while the rest act as support.
|
44 |
|
45 |
# Results
|
46 |
## Some results from the model's performance.
|