shadowml
/

phixtral-3x2_8

@@ -15,9 +15,9 @@ tags:
 ![](https://i.imgur.com/UOb2fvh.jpg)
-# phixtral-2x2_8
-phixtral-2x2_8 is the first Mixure of Experts (MoE) made with two [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) models, inspired by the [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) architecture. It performs better than each individual expert.
 You can try it out using this [Space](https://huggingface.co/spaces/mlabonne/phixtral-chat).
@@ -25,12 +25,7 @@ You can try it out using this [Space](https://huggingface.co/spaces/mlabonne/phi
 The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) on Nous suite.
-|                             Model                              |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
-|----------------------------------------------------------------|------:|------:|---------:|-------:|------:|
-|[**phixtral-2x2_8**](https://huggingface.co/mlabonne/phixtral-2x2_8)|   **34.1**|  **70.44**|     **48.78**|   **37.82**|  **47.78**|
-|[dolphin-2_6-phi-2](https://huggingface.co/cognitivecomputations/dolphin-2_6-phi-2)|  33.12|  69.85|     47.39|    37.2|  46.89|
-|[phi-2-dpo](https://huggingface.co/lxuechen/phi-2-dpo)|  30.39|  71.68|     50.75|    34.9|  46.93|
-|[phi-2](https://huggingface.co/microsoft/phi-2)|  27.98|   70.8|     44.43|   35.21|  44.61|
 Check [YALL - Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) to compare it with other models.
@@ -58,7 +53,7 @@ Here's a [Colab notebook](https://colab.research.google.com/drive/1k6C_oJfEKUq0m
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "phixtral-2x2_8"
 instruction = '''
     def print_prime(n):
         """
@@ -95,9 +90,9 @@ text = tokenizer.batch_decode(outputs)[0]
 print(text)
 ```
-Inspired by [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1), you can specify the `num_experts_per_tok` and `num_local_experts` in the [`config.json`](https://huggingface.co/mlabonne/phixtral-2x2_8/blob/main/config.json#L26-L27) file (2 for both by default). This configuration is automatically loaded in `configuration.py`.
-[vince62s](https://huggingface.co/vince62s) implemented the MoE inference code in the `modeling_phi.py` file. In particular, see the [MoE class](https://huggingface.co/mlabonne/phixtral-2x2_8/blob/main/modeling_phi.py#L293-L317).
 ## 🤝 Acknowledgments

 ![](https://i.imgur.com/UOb2fvh.jpg)
+# phixtral-3x2_8
+phixtral-3x2_8 is the first Mixure of Experts (MoE) made with two [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) models, inspired by the [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) architecture. It performs better than each individual expert.
 You can try it out using this [Space](https://huggingface.co/spaces/mlabonne/phixtral-chat).
 The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) on Nous suite.
+TBD
 Check [YALL - Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) to compare it with other models.
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "phixtral-3x2_8"
 instruction = '''
     def print_prime(n):
         """
 print(text)
 ```
+Inspired by [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1), you can specify the `num_experts_per_tok` and `num_local_experts` in the [`config.json`](https://huggingface.co/mlabonne/phixtral-3x2_8/blob/main/config.json#L26-L27) file (2 for both by default). This configuration is automatically loaded in `configuration.py`.
+[vince62s](https://huggingface.co/vince62s) implemented the MoE inference code in the `modeling_phi.py` file. In particular, see the [MoE class](https://huggingface.co/mlabonne/phixtral-3x2_8/blob/main/modeling_phi.py#L293-L317).
 ## 🤝 Acknowledgments