MediaTek-Research
/

Breeze-7B-Base-v0_1

@@ -83,14 +83,14 @@ Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summa
 **Category ACC of TMMLU+ (5 shot)**
-| Models                           | STEM         | Social Science | Humanities | Other      |
-|-----------------------------------------------------|--------------|----------------|------------|------------|
-| Yi-34B                                        | 56.03        | 73.06          | 61.12      | 62.19      |
-| Qwen-14B                                       | 46.51        | 58.20          | 51.12      | 49.38      |
-| Yi-6B                                         | 41.14        | 57.77          | 50.22      | 49.39      |
-| Qwen-7B                                        | 28.25        | 47.80          | 43.14      | 42.17      |
-| **Breeze-7B-Base-v0.1**               | 35.74        | 46.08          | 40.29      | 39.27      |
-| Mistral-7B-v0.1                           | 33.01        | 42.23          | 35.86      | 37.63      |
@@ -123,8 +123,8 @@ Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summa
 **Category Score of MT-Bench-tw (0 shot)**
-| Models                                              | STEM    |Extraction|Reasoning| Math   | Coding  | Roleplay| Writing |Humanities|Average|
-|-----------------------------------------------------|---------|---------|---------|---------|---------|---------|---------|---------|--------|
 | gpt-3.5-turbo                                       |         |         |         |         |         |         |         |         |         |
 | Yi-34B-Chat                                         |         |         |         |         |         |         |         |         |         |
 | Qwen-14B-Chat                                       |         |         |         |         |         |         |         |         |         |
@@ -137,17 +137,17 @@ Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summa
 **Category ACC of TMMLU+ (0 shot)**
-| Model                                               | STEM         | Social Science | Humanities | Other      | Average |
 |-----------------------------------------------------|--------------|----------------|------------|------------|---------|
-| gpt-3.5-turbo                                       | 41.56        | 46.72          | 36.73      | 42.03      |         |
-| Yi-34B-Chat                                         | 47.65        | 64.25          | 52.73      | 54.91      |         |
-| Qwen-14B-Chat                                       | 43.83        | 55.00          | 48.55      | 46.22      |         |
-| **Breeze-7B-Instruct-v0.1**                         | 37.41        | 46.81          | 42.06      | 40.16      |         |
-| **Breeze-7B-Instruct-64k-v0.1**                     | 37.88        | 46.35          | 40.31      | 39.40      |         |
-| Qwen-7B-Chat                                        | 35.44        | 46.22          | 38.35      | 40.06      |         |
-| Yi-6B-Chat                                          | 37.80        | 51.74          | 45.36      | 44.25      |         |
-| Taiwan-LLM-13B-v2.0-chat                            | 27.74        | 33.69          | 27.03      | 29.43      |         |
-| Taiwan-LLM-7B-v2.1-chat                             | 25.58        | 31.76          | 27.36      | 27.61      |         |
@@ -157,15 +157,15 @@ All inferences run on 2 RTX A6000 GPUs (using `vllm`, with a tensor-parallel siz
 | Models                                                             | Inference Time (sec)|Estimated Max Input Length (Char)|
 |--------------------------------------------------------------------|-------------------|--------------------------|
-| Yi-6B                                                        |   10.62  |   5.2k                |
-| **Breeze-7B-Instruct-v0.1**                              |  10.74  |    11.1k                 |
-| **Breeze-7B-Instruct-64k-v0.1**                              | 10.74       |  88.8k            |
-| Qwen-7B                                                       |   10.86         |    9.8k                  |
-| Qwen-14B                                                      |   18.89  |    9.8k                  |
-| Mistral-7B-v0.1                                          |  20.48   |    5.1k                 |
-| Taiwan-LLM-7B-v2.1-base                                 |   26.26          |    2.2k                  |
-| Taiwan-LLM-13B-v2.0-base                                |   36.80          |    2.2k                  |
-| Yi-34B                                                       |  43.71   |    4.5k                  |
 ## Long-context Performance
@@ -209,3 +209,14 @@ The suggested default `SYS_PROMPT` is
 ```txt
 You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.
 ```

 **Category ACC of TMMLU+ (5 shot)**
+| Models                           | STEM         | Social Science | Humanities | Other      | AVG   |
+|----------------------------------|--------------|----------------|------------|------------|-------|
+| Yi-34B                           | 56.03        | 73.06          | 61.12      | 62.19      | 63.10 |
+| Qwen-14B                         | 46.51        | 58.20          | 51.12      | 49.38      | 51.30 |
+| Yi-6B                            | 41.14        | 57.77          | 50.22      | 49.39      | 49.63 |
+| Qwen-7B                          | 28.25        | 47.80          | 43.14      | 42.17      | 42.84 |
+| **Breeze-7B-Base-v0.1**          | 35.74        | 46.08          | 40.29      | 39.27      | 40.35 |
+| Mistral-7B-v0.1                  | 33.01        | 42.23          | 35.86      | 37.63      | 36.93 |
 **Category Score of MT-Bench-tw (0 shot)**
+| Models                                              | STEM    |Extraction|Reasoning| Math   | Coding  | Roleplay| Writing |Humanities|AVG    |
+|-----------------------------------------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
 | gpt-3.5-turbo                                       |         |         |         |         |         |         |         |         |         |
 | Yi-34B-Chat                                         |         |         |         |         |         |         |         |         |         |
 | Qwen-14B-Chat                                       |         |         |         |         |         |         |         |         |         |
 **Category ACC of TMMLU+ (0 shot)**
+| Model                                               | STEM         | Social Science | Humanities | Other      | AVG     |
 |-----------------------------------------------------|--------------|----------------|------------|------------|---------|
+| Yi-34B-Chat                                         | 47.65        | 64.25          | 52.73      | 54.91      | 54.87   |
+| Qwen-14B-Chat                                       | 43.83        | 55.00          | 48.55      | 46.22      | 48.41   |
+| Yi-6B-Chat                                          | 37.80        | 51.74          | 45.36      | 44.25      | 44.79   |
+| gpt-3.5-turbo                                       | 41.56        | 46.72          | 36.73      | 42.03      | 41.76   |
+| **Breeze-7B-Instruct-v0.1**                         | 37.41        | 46.81          | 42.06      | 40.16      | 41.61   |
+| **Breeze-7B-Instruct-64k-v0.1**                     | 37.88        | 46.35          | 40.31      | 39.40      | 40.99   |
+| Qwen-7B-Chat                                        | 35.44        | 46.22          | 38.35      | 40.06      | 40.02   |
+| Taiwan-LLM-13B-v2.0-chat                            | 27.74        | 33.69          | 27.03      | 29.43      | 29.47   |
+| Taiwan-LLM-7B-v2.1-chat                             | 25.58        | 31.76          | 27.36      | 27.61      | 28.08   |
 | Models                                                             | Inference Time (sec)|Estimated Max Input Length (Char)|
 |--------------------------------------------------------------------|-------------------|--------------------------|
+| Yi-6B                                                              |   10.62  |   5.2k                |
+| **Breeze-7B-Instruct-v0.1**                                        |  10.74  |    11.1k                 |
+| **Breeze-7B-Instruct-64k-v0.1**                                    | 10.74       |  88.8k            |
+| Qwen-7B                                                            |   10.86         |    9.8k                  |
+| Qwen-14B                                                           |   18.89  |    9.8k                  |
+| Mistral-7B-v0.1                                                    |  20.48   |    5.1k                 |
+| Taiwan-LLM-7B-v2.1-base                                            |   26.26          |    2.2k                  |
+| Taiwan-LLM-13B-v2.0-base                                           |   36.80          |    2.2k                  |
+| Yi-34B                                                             |  43.71   |    4.5k                  |
 ## Long-context Performance
 ```txt
 You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.
 ```
+## Citation
+```
+@article{breeze7b2024,
+  title={},
+  author={},
+  journal={arXiv},
+  year={2024}
+}
+```