shadowml commited on
Commit
3088a58
·
verified ·
1 Parent(s): 1c0d09d

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +6 -11
README.md CHANGED
@@ -15,9 +15,9 @@ tags:
15
 
16
  ![](https://i.imgur.com/UOb2fvh.jpg)
17
 
18
- # phixtral-2x2_8
19
 
20
- phixtral-2x2_8 is the first Mixure of Experts (MoE) made with two [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) models, inspired by the [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) architecture. It performs better than each individual expert.
21
 
22
  You can try it out using this [Space](https://huggingface.co/spaces/mlabonne/phixtral-chat).
23
 
@@ -25,12 +25,7 @@ You can try it out using this [Space](https://huggingface.co/spaces/mlabonne/phi
25
 
26
  The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) on Nous suite.
27
 
28
- | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
29
- |----------------------------------------------------------------|------:|------:|---------:|-------:|------:|
30
- |[**phixtral-2x2_8**](https://huggingface.co/mlabonne/phixtral-2x2_8)| **34.1**| **70.44**| **48.78**| **37.82**| **47.78**|
31
- |[dolphin-2_6-phi-2](https://huggingface.co/cognitivecomputations/dolphin-2_6-phi-2)| 33.12| 69.85| 47.39| 37.2| 46.89|
32
- |[phi-2-dpo](https://huggingface.co/lxuechen/phi-2-dpo)| 30.39| 71.68| 50.75| 34.9| 46.93|
33
- |[phi-2](https://huggingface.co/microsoft/phi-2)| 27.98| 70.8| 44.43| 35.21| 44.61|
34
 
35
  Check [YALL - Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) to compare it with other models.
36
 
@@ -58,7 +53,7 @@ Here's a [Colab notebook](https://colab.research.google.com/drive/1k6C_oJfEKUq0m
58
  import torch
59
  from transformers import AutoModelForCausalLM, AutoTokenizer
60
 
61
- model_name = "phixtral-2x2_8"
62
  instruction = '''
63
  def print_prime(n):
64
  """
@@ -95,9 +90,9 @@ text = tokenizer.batch_decode(outputs)[0]
95
  print(text)
96
  ```
97
 
98
- Inspired by [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1), you can specify the `num_experts_per_tok` and `num_local_experts` in the [`config.json`](https://huggingface.co/mlabonne/phixtral-2x2_8/blob/main/config.json#L26-L27) file (2 for both by default). This configuration is automatically loaded in `configuration.py`.
99
 
100
- [vince62s](https://huggingface.co/vince62s) implemented the MoE inference code in the `modeling_phi.py` file. In particular, see the [MoE class](https://huggingface.co/mlabonne/phixtral-2x2_8/blob/main/modeling_phi.py#L293-L317).
101
 
102
  ## 🤝 Acknowledgments
103
 
 
15
 
16
  ![](https://i.imgur.com/UOb2fvh.jpg)
17
 
18
+ # phixtral-3x2_8
19
 
20
+ phixtral-3x2_8 is the first Mixure of Experts (MoE) made with two [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) models, inspired by the [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) architecture. It performs better than each individual expert.
21
 
22
  You can try it out using this [Space](https://huggingface.co/spaces/mlabonne/phixtral-chat).
23
 
 
25
 
26
  The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) on Nous suite.
27
 
28
+ TBD
 
 
 
 
 
29
 
30
  Check [YALL - Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) to compare it with other models.
31
 
 
53
  import torch
54
  from transformers import AutoModelForCausalLM, AutoTokenizer
55
 
56
+ model_name = "phixtral-3x2_8"
57
  instruction = '''
58
  def print_prime(n):
59
  """
 
90
  print(text)
91
  ```
92
 
93
+ Inspired by [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1), you can specify the `num_experts_per_tok` and `num_local_experts` in the [`config.json`](https://huggingface.co/mlabonne/phixtral-3x2_8/blob/main/config.json#L26-L27) file (2 for both by default). This configuration is automatically loaded in `configuration.py`.
94
 
95
+ [vince62s](https://huggingface.co/vince62s) implemented the MoE inference code in the `modeling_phi.py` file. In particular, see the [MoE class](https://huggingface.co/mlabonne/phixtral-3x2_8/blob/main/modeling_phi.py#L293-L317).
96
 
97
  ## 🤝 Acknowledgments
98