Update README.md
Browse files
README.md
CHANGED
@@ -6,9 +6,14 @@ language:
|
|
6 |
<p align="center"> <img src="https://cdn-lfs-us-1.huggingface.co/repos/58/11/5811c78d8fc8a7e29f637f442dc17b5fdc3ee97e6ce5e3ead6c9eaeed704e08f/12a4f1bdfdaabdc5114d8e72465b60c97c5e2037a7d5c22ff5fd53cfa80e58ab?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27DeepSeek-Mixtral.png%3B+filename%3D%22DeepSeek-Mixtral.png%22%3B&response-content-type=image%2Fpng&Expires=1715149068&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxNTE0OTA2OH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmh1Z2dpbmdmYWNlLmNvL3JlcG9zLzU4LzExLzU4MTFjNzhkOGZjOGE3ZTI5ZjYzN2Y0NDJkYzE3YjVmZGMzZWU5N2U2Y2U1ZTNlYWQ2YzllYWVlZDcwNGUwOGYvMTJhNGYxYmRmZGFhYmRjNTExNGQ4ZTcyNDY1YjYwYzk3YzVlMjAzN2E3ZDVjMjJmZjVmZDUzY2ZhODBlNThhYj9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoifV19&Signature=RZNhKGJhLpnd2j%7EeImEHG8wBlntw9yJ6xJcNcbQXdngDetFhqFK46fJ3ndgzAoxbgwSHrgpYTdAR9ZSzinuY8TuvUgXEX64dZhvmgLIzcfdqfMIKOOg4XME45rZpWdQApAn%7EsSGNNwJPGvXh3MHXPjo0fOxiCf5zSPNl342EInA8FY%7E2jXEykwrfAK5OBWpbEi65WSbBSs6r3ob-66dURDEKfvfPN22VMvAYfiBiajvo6tQcL8cQOK5BWeQcsAZCDOTSxljD8--g2nXU2pl5WXh6Kv74szFWA4zEL7GOaZLRNdcTUHQmxen6144xngrv%7ERnd2jTRNCpH27M7rbGpvA__&Key-Pair-Id=KCD77M1F0VK2B" width="auto" title="LlaMoE-Medium model image"> </p>
|
7 |
|
8 |
|
9 |
-
|
10 |
-
It is 2 experts per token. Performance is identical to instruct, if not a little better. Evals will come, It is more malleable to training. This is our first experiment with expert extraction and modification, more to come. Enjoy.
|
11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
## Instruction Format
|
14 |
To leverage instruction fine-tuning, your prompts should be enclosed with `[INST]` and `[/INST]` tokens. The very first instruction should begin with a begin-of-sentence id, while subsequent instructions should not. Assistant generation will conclude with an end-of-sentence token id.
|
@@ -52,5 +57,5 @@ print(decoded[0])
|
|
52 |
|
53 |
Special Thanks: Eric Hartford, and Fernando Neto.
|
54 |
|
55 |
-
-Lucas Atkins (Crystalcareai)
|
56 |
|
|
|
6 |
<p align="center"> <img src="https://cdn-lfs-us-1.huggingface.co/repos/58/11/5811c78d8fc8a7e29f637f442dc17b5fdc3ee97e6ce5e3ead6c9eaeed704e08f/12a4f1bdfdaabdc5114d8e72465b60c97c5e2037a7d5c22ff5fd53cfa80e58ab?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27DeepSeek-Mixtral.png%3B+filename%3D%22DeepSeek-Mixtral.png%22%3B&response-content-type=image%2Fpng&Expires=1715149068&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxNTE0OTA2OH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmh1Z2dpbmdmYWNlLmNvL3JlcG9zLzU4LzExLzU4MTFjNzhkOGZjOGE3ZTI5ZjYzN2Y0NDJkYzE3YjVmZGMzZWU5N2U2Y2U1ZTNlYWQ2YzllYWVlZDcwNGUwOGYvMTJhNGYxYmRmZGFhYmRjNTExNGQ4ZTcyNDY1YjYwYzk3YzVlMjAzN2E3ZDVjMjJmZjVmZDUzY2ZhODBlNThhYj9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoifV19&Signature=RZNhKGJhLpnd2j%7EeImEHG8wBlntw9yJ6xJcNcbQXdngDetFhqFK46fJ3ndgzAoxbgwSHrgpYTdAR9ZSzinuY8TuvUgXEX64dZhvmgLIzcfdqfMIKOOg4XME45rZpWdQApAn%7EsSGNNwJPGvXh3MHXPjo0fOxiCf5zSPNl342EInA8FY%7E2jXEykwrfAK5OBWpbEi65WSbBSs6r3ob-66dURDEKfvfPN22VMvAYfiBiajvo6tQcL8cQOK5BWeQcsAZCDOTSxljD8--g2nXU2pl5WXh6Kv74szFWA4zEL7GOaZLRNdcTUHQmxen6144xngrv%7ERnd2jTRNCpH27M7rbGpvA__&Key-Pair-Id=KCD77M1F0VK2B" width="auto" title="LlaMoE-Medium model image"> </p>
|
7 |
|
8 |
|
9 |
+
## DeepSeek-MoE Architecture Integration
|
|
|
10 |
|
11 |
+
This is a direct extraction of the 8 experts from [Mixtral-8x7b-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1), and placing them into the DeepSeek-MoE Architecture.
|
12 |
+
|
13 |
+
- **Expert Configuration:** It is 2 experts per token.
|
14 |
+
- **Performance:** Performance is identical to instruct, if not a little better.
|
15 |
+
- **Evaluations:** Evals will come, it is more malleable to training.
|
16 |
+
- **Experimentation:** This is our first experiment with expert extraction and modification, more to come. Enjoy.
|
17 |
|
18 |
## Instruction Format
|
19 |
To leverage instruction fine-tuning, your prompts should be enclosed with `[INST]` and `[/INST]` tokens. The very first instruction should begin with a begin-of-sentence id, while subsequent instructions should not. Assistant generation will conclude with an end-of-sentence token id.
|
|
|
57 |
|
58 |
Special Thanks: Eric Hartford, and Fernando Neto.
|
59 |
|
60 |
+
- Lucas Atkins (Crystalcareai)
|
61 |
|