Update README.md
Browse files
README.md
CHANGED
@@ -1,27 +1,63 @@
|
|
1 |
---
|
2 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
3 |
---
|
4 |
|
5 |
# Model Card for Breeze-7B-Base-v0.1
|
6 |
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
*A project by the members (in alphabetical order): Chan-Jan Hsu 許湛然, Chang-Le Liu 劉昶樂, Feng-Ting Liao 廖峰挺, Po-Chun Hsu 許博竣, Yi-Chang Chen 陳宜昌, and the supervisor Da-Shan Shiu 許大山.*
|
15 |
|
16 |
## Features
|
17 |
|
18 |
-
-
|
19 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
## Model Details
|
22 |
-
|
23 |
-
-
|
24 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
## Base Model Performance
|
27 |
|
@@ -49,6 +85,12 @@ and is comparable with Mistral-7B-v0.1 on MMLU and MT-Bench in English.
|
|
49 |
| Mistral-7B-v0.1 | 33.01 | 42.23 | 35.86 | 37.63 |
|
50 |
|
51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
## Chat Model Performance
|
53 |
|
54 |
| Models | | TMMLU+ (ACC) | TMMLU+ (ACC) | DRCD (EM) | Table (ACC) | MT-Bench-tw (Score) | MMLU (ACC) | MMLU (ACC) | MT-Bench (Score) |
|
@@ -80,12 +122,19 @@ and is comparable with Mistral-7B-v0.1 on MMLU and MT-Bench in English.
|
|
80 |
| Taiwan-LLM-13B-v2.0-chat | 27.74 | 33.69 | 27.03 | 29.43 |
|
81 |
| Taiwan-LLM-7B-v2.1-chat | 25.58 | 31.76 | 27.36 | 27.61 |
|
82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
|
84 |
## Inference Performance
|
85 |
In this test, we use the first 700 characters of the [web article](https://health.udn.com/health/story/5976/7699252?from=udn_ch1005_main_index) as the input and ask the model to write the same article again.
|
86 |
-
All
|
87 |
|
88 |
-
| Models | Inference Time (sec)|Estimated Max Input Length (
|
89 |
|--------------------------------------------------------------------|-------------------|--------------------------|
|
90 |
| Yi-6B | 10.62 | 5.2k |
|
91 |
| **Breeze-7B-Instruct-v0.1** | 10.74 | 11.1k |
|
@@ -97,12 +146,19 @@ All models were inferenced with `vllm` on 2 A6000 (TP=2).
|
|
97 |
| Taiwan-LLM-13B-v2.0-base | 36.80 | 2.2k |
|
98 |
| Yi-34B | 43.71 | 4.5k |
|
99 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
100 |
|
101 |
## Use in Transformers
|
102 |
|
103 |
-
First
|
104 |
```
|
105 |
-
pip install transformers
|
106 |
```
|
107 |
If you want faster inference using flash-attention2, you need to install these dependencies:
|
108 |
```bash
|
@@ -115,9 +171,20 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
115 |
import torch
|
116 |
|
117 |
model = AutoModelForCausalLM.from_pretrained(
|
118 |
-
model="MediaTek-Research/Breeze-7B-
|
119 |
device_map="auto",
|
120 |
torch_dtype=torch.bfloat16,
|
121 |
-
|
122 |
)
|
123 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
pipeline_tag: text-generation
|
3 |
+
license: apache-2.0
|
4 |
+
language:
|
5 |
+
- zh
|
6 |
---
|
7 |
|
8 |
# Model Card for Breeze-7B-Base-v0.1
|
9 |
|
10 |
+
|
11 |
+
Breeze-7B is a language model that builds upon the foundation of [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1), specifically enhanced for Traditional Chinese.
|
12 |
+
|
13 |
+
[Breeze-7B-Base-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v0.1) introduces an expanded vocabulary with additional 30,000 Traditional Chinese tokens and
|
14 |
+
is pre-trained on a substantial dataset of 250GB of Traditional Chinese content.
|
15 |
+
With the expanded vocabulary, the base model operates at twice the inference speed for Traditional Chinese characters compared to Mistral-7B. [See [Inference Performance](#inference-performance).]
|
16 |
+
This achievement marks a significant milestone as it is the first instance of vocabulary expansion in a model tailored for Traditional Chinese.
|
17 |
+
|
18 |
+
[Breeze-7B-Instruct-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0.1) derives from the base model Breeze-7B-Base-v0.1
|
19 |
+
and has undergone supervised fine-tuning with over 1 million instances to
|
20 |
+
sharpen its capabilities. This fine-tuned model demonstrates impressive performance in benchmarks for both English and Traditional Chinese, surpassing the results of
|
21 |
+
Taiwan-LLM-7B-v2.1-chat, Taiwan-LLM-13B-v2.0-chat and Qwen-7B-chat in Traditional Chinese assessments. It also excels in some benchmarks against Yi-6B-Chat.
|
22 |
+
In English evaluations, Breeze-7B-Instruct-v0.1 shows comparable results to Mistral-7B-Instruct-v0.1 on the MMLU and MT-Bench benchmarks. [See [Chat Model Performance](#chat-model-performance).]
|
23 |
+
|
24 |
+
|
25 |
+
[Breeze-7B-Instruct-64k-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-64k-v0.1) is an extension to Breeze-7B-Instruct-v0.1
|
26 |
+
to enable 64k
|
27 |
+
context length, which is equivalent to 88k Traditional Chinese characters. With minimal sacrifice in the performance of the regular benchmarks,
|
28 |
+
Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summarization on document-level inputs. [See [Long-context Performance](#long-context-performance).]
|
29 |
+
|
30 |
|
31 |
*A project by the members (in alphabetical order): Chan-Jan Hsu 許湛然, Chang-Le Liu 劉昶樂, Feng-Ting Liao 廖峰挺, Po-Chun Hsu 許博竣, Yi-Chang Chen 陳宜昌, and the supervisor Da-Shan Shiu 許大山.*
|
32 |
|
33 |
## Features
|
34 |
|
35 |
+
- Breeze-7B-Base-v0.1
|
36 |
+
- Expanding the vocabulary dictionary size from 32k to 62k to better support Traditional Chinese
|
37 |
+
- 8k tokens context length
|
38 |
+
- Breeze-7B-Instruct-v0.1
|
39 |
+
- Expanding the vocabulary dictionary size from 32k to 62k to better support Traditional Chinese
|
40 |
+
- 8k tokens context length
|
41 |
+
- Multi-turn dialogue (without special handling for harmfulness)
|
42 |
+
- Breeze-7B-Instruct-64k-v0.1
|
43 |
+
- Expanding the vocabulary dictionary size from 32k to 62k to better support Traditional Chinese
|
44 |
+
- 64k tokens context length
|
45 |
+
- Multi-turn dialogue (without special handling for harmfulness)
|
46 |
|
47 |
## Model Details
|
48 |
+
|
49 |
+
- Breeze-7B-Base-v0.1
|
50 |
+
- Finetuned from: [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
|
51 |
+
- Model type: Causal decoder-only transformer language model
|
52 |
+
- Language: English and Traditional Chinese (zh-tw)
|
53 |
+
- Breeze-7B-Instruct-v0.1
|
54 |
+
- Finetuned from: [MediaTek-Research/Breeze-7B-Base-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v0.1)
|
55 |
+
- Model type: Causal decoder-only transformer language model
|
56 |
+
- Language: English and Traditional Chinese (zh-tw)
|
57 |
+
- Breeze-7B-Instruct-64k-v0.1
|
58 |
+
- Finetuned from: [MediaTek-Research/Breeze-7B-Instruct-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0.1)
|
59 |
+
- Model type: Causal decoder-only transformer language model
|
60 |
+
- Language: English and Traditional Chinese (zh-tw)
|
61 |
|
62 |
## Base Model Performance
|
63 |
|
|
|
85 |
| Mistral-7B-v0.1 | 33.01 | 42.23 | 35.86 | 37.63 |
|
86 |
|
87 |
|
88 |
+
**TMMLU+**, **DRCD**, and **Table** source from [MediaTek-Research/TCEval-v2](https://huggingface.co/datasets/MediaTek-Research/TCEval-v2).
|
89 |
+
[MediaTek-Research/TCEval-v2](https://huggingface.co/datasets/MediaTek-Research/TCEval-v2) derives from [TCEval-v1](https://github.com/mtkresearch/MR-Models/tree/main/TC-Eval)
|
90 |
+
and [ikala/tmmluplus](https://huggingface.co/datasets/ikala/tmmluplus). **MMLU** sources from [hails/mmlu_no_train](https://huggingface.co/datasets/hails/mmlu_no_train).
|
91 |
+
We use the code revised from [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate **TMMLU+**, **DRCD**, **Table**, and **MMLU**.
|
92 |
+
|
93 |
+
|
94 |
## Chat Model Performance
|
95 |
|
96 |
| Models | | TMMLU+ (ACC) | TMMLU+ (ACC) | DRCD (EM) | Table (ACC) | MT-Bench-tw (Score) | MMLU (ACC) | MMLU (ACC) | MT-Bench (Score) |
|
|
|
122 |
| Taiwan-LLM-13B-v2.0-chat | 27.74 | 33.69 | 27.03 | 29.43 |
|
123 |
| Taiwan-LLM-7B-v2.1-chat | 25.58 | 31.76 | 27.36 | 27.61 |
|
124 |
|
125 |
+
**TMMLU+**, **DRCD**, **Table**, and **MT-Bench-tw** source from [MediaTek-Research/TCEval-v2](https://huggingface.co/datasets/MediaTek-Research/TCEval-v2).
|
126 |
+
[MediaTek-Research/TCEval-v2](https://huggingface.co/datasets/MediaTek-Research/TCEval-v2) derives from [TCEval-v1](https://github.com/mtkresearch/MR-Models/tree/main/TC-Eval)
|
127 |
+
and [ikala/tmmluplus](https://huggingface.co/datasets/ikala/tmmluplus). **MMLU** sources from [hails/mmlu_no_train](https://huggingface.co/datasets/hails/mmlu_no_train).
|
128 |
+
**MT-Bench** source from [lmsys/mt_bench_human_judgments](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments).
|
129 |
+
We use the code revised from [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate **TMMLU+**, **DRCD**, **Table**, and **MMLU**.
|
130 |
+
We use the code revised from [fastchat llm_judge](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) to evaluate **MT-Bench-tw** and **MT-Bench**.
|
131 |
+
|
132 |
|
133 |
## Inference Performance
|
134 |
In this test, we use the first 700 characters of the [web article](https://health.udn.com/health/story/5976/7699252?from=udn_ch1005_main_index) as the input and ask the model to write the same article again.
|
135 |
+
All inferences run on 2 RTX A6000 GPUs (using `vllm`, with a tensor-parallel size of 2).
|
136 |
|
137 |
+
| Models | Inference Time (sec)|Estimated Max Input Length (Char)|
|
138 |
|--------------------------------------------------------------------|-------------------|--------------------------|
|
139 |
| Yi-6B | 10.62 | 5.2k |
|
140 |
| **Breeze-7B-Instruct-v0.1** | 10.74 | 11.1k |
|
|
|
146 |
| Taiwan-LLM-13B-v2.0-base | 36.80 | 2.2k |
|
147 |
| Yi-34B | 43.71 | 4.5k |
|
148 |
|
149 |
+
## Long-context Performance
|
150 |
+
|
151 |
+
TBD
|
152 |
+
|
153 |
+
## Examples
|
154 |
+
|
155 |
+
TBD
|
156 |
|
157 |
## Use in Transformers
|
158 |
|
159 |
+
First install direct dependencies:
|
160 |
```
|
161 |
+
pip install transformers torch accelerate
|
162 |
```
|
163 |
If you want faster inference using flash-attention2, you need to install these dependencies:
|
164 |
```bash
|
|
|
171 |
import torch
|
172 |
|
173 |
model = AutoModelForCausalLM.from_pretrained(
|
174 |
+
model="MediaTek-Research/Breeze-7B-Instruct-v0.1",
|
175 |
device_map="auto",
|
176 |
torch_dtype=torch.bfloat16,
|
177 |
+
use_flash_attn_2=True # optional
|
178 |
)
|
179 |
```
|
180 |
+
|
181 |
+
The structure of the query template follows that of Mistral-7B-Instruct, as shown below.
|
182 |
+
```txt
|
183 |
+
<s> SYS_PROMPT [INST] QUERY1 [/INST] RESPONSE1 [INST] QUERY2 [/INST]
|
184 |
+
```
|
185 |
+
where `SYS_PROMPT`, `QUERY1`, `RESPONSE1`, and `QUERY2` can be provided by the user.
|
186 |
+
|
187 |
+
The suggested default `SYS_PROMPT` is
|
188 |
+
```txt
|
189 |
+
You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.
|
190 |
+
```
|