Crystalcareai commited on
Commit
50d4f86
·
verified ·
1 Parent(s): d197773

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -8,11 +8,11 @@ language:
8
 
9
  ## Mixtral Experts with DeepSeek-MoE Architecture
10
 
11
- This is a direct extraction of the 8 experts from [Mixtral-8x7b-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1), and placing them into the DeepSeek-MoE Architecture.
12
 
13
  - **Expert Configuration:** It is 2 experts per token.
14
  - **Performance:** Performance is identical to instruct, if not a little better.
15
- - **Evaluations:** Evals will come, it is more malleable to training.
16
  - **Experimentation:** This is the first of a few MoE expert extraction and modification projects we're working on, more to come. Enjoy.
17
 
18
  ## Instruction Format
 
8
 
9
  ## Mixtral Experts with DeepSeek-MoE Architecture
10
 
11
+ This is a direct extraction of the 8 experts from [Mixtral-8x7b-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1), and a transfer of them into the DeepSeek-MoE Architecture.
12
 
13
  - **Expert Configuration:** It is 2 experts per token.
14
  - **Performance:** Performance is identical to instruct, if not a little better.
15
+ - **Evaluations:** Evals will come when compute clears up, it also appears more malleable to training.
16
  - **Experimentation:** This is the first of a few MoE expert extraction and modification projects we're working on, more to come. Enjoy.
17
 
18
  ## Instruction Format