Update README.md
Browse files
README.md
CHANGED
@@ -7,3 +7,9 @@ language:
|
|
7 |
|
8 |
|
9 |
This is a direct extraction of the 8 experts from [Mixtral-8x7b-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1), and placing them into the Deepseek-MoE Architecture.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
|
8 |
|
9 |
This is a direct extraction of the 8 experts from [Mixtral-8x7b-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1), and placing them into the Deepseek-MoE Architecture.
|
10 |
+
It is 2 experts per token. Performance is good, and is likely more malleable to training. This is our first experiment with expert extraction and modification, more to come. Enjoy.
|
11 |
+
|
12 |
+
Special Thanks: Eric Hartford, and Fernando Neto.
|
13 |
+
|
14 |
+
-Lucas Atkins (Crystalcareai)
|
15 |
+
|