cognitivecomputations
/

DeepMixtral-8x7b-Instruct

@@ -6,9 +6,14 @@ language:
 <p align="center"> <img src="https://cdn-lfs-us-1.huggingface.co/repos/58/11/5811c78d8fc8a7e29f637f442dc17b5fdc3ee97e6ce5e3ead6c9eaeed704e08f/12a4f1bdfdaabdc5114d8e72465b60c97c5e2037a7d5c22ff5fd53cfa80e58ab?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27DeepSeek-Mixtral.png%3B+filename%3D%22DeepSeek-Mixtral.png%22%3B&response-content-type=image%2Fpng&Expires=1715149068&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxNTE0OTA2OH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmh1Z2dpbmdmYWNlLmNvL3JlcG9zLzU4LzExLzU4MTFjNzhkOGZjOGE3ZTI5ZjYzN2Y0NDJkYzE3YjVmZGMzZWU5N2U2Y2U1ZTNlYWQ2YzllYWVlZDcwNGUwOGYvMTJhNGYxYmRmZGFhYmRjNTExNGQ4ZTcyNDY1YjYwYzk3YzVlMjAzN2E3ZDVjMjJmZjVmZDUzY2ZhODBlNThhYj9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoifV19&Signature=RZNhKGJhLpnd2j%7EeImEHG8wBlntw9yJ6xJcNcbQXdngDetFhqFK46fJ3ndgzAoxbgwSHrgpYTdAR9ZSzinuY8TuvUgXEX64dZhvmgLIzcfdqfMIKOOg4XME45rZpWdQApAn%7EsSGNNwJPGvXh3MHXPjo0fOxiCf5zSPNl342EInA8FY%7E2jXEykwrfAK5OBWpbEi65WSbBSs6r3ob-66dURDEKfvfPN22VMvAYfiBiajvo6tQcL8cQOK5BWeQcsAZCDOTSxljD8--g2nXU2pl5WXh6Kv74szFWA4zEL7GOaZLRNdcTUHQmxen6144xngrv%7ERnd2jTRNCpH27M7rbGpvA__&Key-Pair-Id=KCD77M1F0VK2B" width="auto" title="LlaMoE-Medium model image"> </p>
-This is a direct extraction of the 8 experts from [Mixtral-8x7b-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1), and placing them into the Deepseek-MoE Architecture.
-It is 2 experts per token. Performance is identical to instruct, if not a little better. Evals will come, It is more malleable to training. This is our first experiment with expert extraction and modification, more to come. Enjoy.
 ## Instruction Format
 To leverage instruction fine-tuning, your prompts should be enclosed with `[INST]` and `[/INST]` tokens. The very first instruction should begin with a begin-of-sentence id, while subsequent instructions should not. Assistant generation will conclude with an end-of-sentence token id.
@@ -52,5 +57,5 @@ print(decoded[0])
 Special Thanks: Eric Hartford, and Fernando Neto.
--Lucas Atkins (Crystalcareai)

 <p align="center"> <img src="https://cdn-lfs-us-1.huggingface.co/repos/58/11/5811c78d8fc8a7e29f637f442dc17b5fdc3ee97e6ce5e3ead6c9eaeed704e08f/12a4f1bdfdaabdc5114d8e72465b60c97c5e2037a7d5c22ff5fd53cfa80e58ab?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27DeepSeek-Mixtral.png%3B+filename%3D%22DeepSeek-Mixtral.png%22%3B&response-content-type=image%2Fpng&Expires=1715149068&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxNTE0OTA2OH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmh1Z2dpbmdmYWNlLmNvL3JlcG9zLzU4LzExLzU4MTFjNzhkOGZjOGE3ZTI5ZjYzN2Y0NDJkYzE3YjVmZGMzZWU5N2U2Y2U1ZTNlYWQ2YzllYWVlZDcwNGUwOGYvMTJhNGYxYmRmZGFhYmRjNTExNGQ4ZTcyNDY1YjYwYzk3YzVlMjAzN2E3ZDVjMjJmZjVmZDUzY2ZhODBlNThhYj9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoifV19&Signature=RZNhKGJhLpnd2j%7EeImEHG8wBlntw9yJ6xJcNcbQXdngDetFhqFK46fJ3ndgzAoxbgwSHrgpYTdAR9ZSzinuY8TuvUgXEX64dZhvmgLIzcfdqfMIKOOg4XME45rZpWdQApAn%7EsSGNNwJPGvXh3MHXPjo0fOxiCf5zSPNl342EInA8FY%7E2jXEykwrfAK5OBWpbEi65WSbBSs6r3ob-66dURDEKfvfPN22VMvAYfiBiajvo6tQcL8cQOK5BWeQcsAZCDOTSxljD8--g2nXU2pl5WXh6Kv74szFWA4zEL7GOaZLRNdcTUHQmxen6144xngrv%7ERnd2jTRNCpH27M7rbGpvA__&Key-Pair-Id=KCD77M1F0VK2B" width="auto" title="LlaMoE-Medium model image"> </p>
+## DeepSeek-MoE Architecture Integration
+This is a direct extraction of the 8 experts from [Mixtral-8x7b-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1), and placing them into the DeepSeek-MoE Architecture.
+- **Expert Configuration:** It is 2 experts per token.
+- **Performance:** Performance is identical to instruct, if not a little better.
+- **Evaluations:** Evals will come, it is more malleable to training.
+- **Experimentation:** This is our first experiment with expert extraction and modification, more to come. Enjoy.
 ## Instruction Format
 To leverage instruction fine-tuning, your prompts should be enclosed with `[INST]` and `[/INST]` tokens. The very first instruction should begin with a begin-of-sentence id, while subsequent instructions should not. Assistant generation will conclude with an end-of-sentence token id.
 Special Thanks: Eric Hartford, and Fernando Neto.
+- Lucas Atkins (Crystalcareai)