YC-Chen commited on
Commit
0ab63dd
·
verified ·
1 Parent(s): 28c1f04

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -17
README.md CHANGED
@@ -1,27 +1,63 @@
1
  ---
2
  pipeline_tag: text-generation
 
 
 
3
  ---
4
 
5
  # Model Card for Breeze-7B-Base-v0.1
6
 
7
- Breeze-7B-Base-v0.1 is a 7-billion-parameter language model built from Mistral-7B and tailored for Traditional Chinese (zh-tw).
8
- This model expands the Traditional Chinese vocabulary by adding an extra 30k Traditional Chinese tokens to the original Mistral-7B. With this, the model adapts better to Traditional Chinese and is 2x as efficient in the encoding and decoding of Traditional Chinese compared to Mistral-7B.
9
- To the best of our knowledge, this is the first work on vocabulary expansion in Traditional Chinese.
10
- This model is trained on 250GB of high-quality Traditional Chinese data using continual pre-training.
11
- Breeze-7B-Base-v0.1 performs well on both EN and TC benchmarks, outperforming Taiwan-LLM-7B-v2.1-base, Taiwan-LLM-13B-v2.0-base, and Yi-6B-Base on all TC benchmarks
12
- and is comparable with Mistral-7B-v0.1 on MMLU and MT-Bench in English.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  *A project by the members (in alphabetical order): Chan-Jan Hsu 許湛然, Chang-Le Liu 劉昶樂, Feng-Ting Liao 廖峰挺, Po-Chun Hsu 許博竣, Yi-Chang Chen 陳宜昌, and the supervisor Da-Shan Shiu 許大山.*
15
 
16
  ## Features
17
 
18
- - Expanding the vocabulary dictionary from 32k to 62k vocabulary size to better support Traditional Chinese
19
- - 8k context length
 
 
 
 
 
 
 
 
 
20
 
21
  ## Model Details
22
- - **Finetuned from:** [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
23
- - **Model type:** Causal decoder-only transformer language model
24
- - **Language:** English and Traditional Chinese (zh-tw)
 
 
 
 
 
 
 
 
 
 
25
 
26
  ## Base Model Performance
27
 
@@ -49,6 +85,12 @@ and is comparable with Mistral-7B-v0.1 on MMLU and MT-Bench in English.
49
  | Mistral-7B-v0.1 | 33.01 | 42.23 | 35.86 | 37.63 |
50
 
51
 
 
 
 
 
 
 
52
  ## Chat Model Performance
53
 
54
  | Models | | TMMLU+ (ACC) | TMMLU+ (ACC) | DRCD (EM) | Table (ACC) | MT-Bench-tw (Score) | MMLU (ACC) | MMLU (ACC) | MT-Bench (Score) |
@@ -80,12 +122,19 @@ and is comparable with Mistral-7B-v0.1 on MMLU and MT-Bench in English.
80
  | Taiwan-LLM-13B-v2.0-chat | 27.74 | 33.69 | 27.03 | 29.43 |
81
  | Taiwan-LLM-7B-v2.1-chat | 25.58 | 31.76 | 27.36 | 27.61 |
82
 
 
 
 
 
 
 
 
83
 
84
  ## Inference Performance
85
  In this test, we use the first 700 characters of the [web article](https://health.udn.com/health/story/5976/7699252?from=udn_ch1005_main_index) as the input and ask the model to write the same article again.
86
- All models were inferenced with `vllm` on 2 A6000 (TP=2).
87
 
88
- | Models | Inference Time (sec)|Estimated Max Input Length (TC Char)|
89
  |--------------------------------------------------------------------|-------------------|--------------------------|
90
  | Yi-6B | 10.62 | 5.2k |
91
  | **Breeze-7B-Instruct-v0.1** | 10.74 | 11.1k |
@@ -97,12 +146,19 @@ All models were inferenced with `vllm` on 2 A6000 (TP=2).
97
  | Taiwan-LLM-13B-v2.0-base | 36.80 | 2.2k |
98
  | Yi-34B | 43.71 | 4.5k |
99
 
 
 
 
 
 
 
 
100
 
101
  ## Use in Transformers
102
 
103
- First, install direct dependencies:
104
  ```
105
- pip install transformers==4.36.1 torch accelerate
106
  ```
107
  If you want faster inference using flash-attention2, you need to install these dependencies:
108
  ```bash
@@ -115,9 +171,20 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
115
  import torch
116
 
117
  model = AutoModelForCausalLM.from_pretrained(
118
- model="MediaTek-Research/Breeze-7B-Base-v0.1",
119
  device_map="auto",
120
  torch_dtype=torch.bfloat16,
121
- attn_implementation="flash_attention_2" # optional
122
  )
123
  ```
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  pipeline_tag: text-generation
3
+ license: apache-2.0
4
+ language:
5
+ - zh
6
  ---
7
 
8
  # Model Card for Breeze-7B-Base-v0.1
9
 
10
+
11
+ Breeze-7B is a language model that builds upon the foundation of [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1), specifically enhanced for Traditional Chinese.
12
+
13
+ [Breeze-7B-Base-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v0.1) introduces an expanded vocabulary with additional 30,000 Traditional Chinese tokens and
14
+ is pre-trained on a substantial dataset of 250GB of Traditional Chinese content.
15
+ With the expanded vocabulary, the base model operates at twice the inference speed for Traditional Chinese characters compared to Mistral-7B. [See [Inference Performance](#inference-performance).]
16
+ This achievement marks a significant milestone as it is the first instance of vocabulary expansion in a model tailored for Traditional Chinese.
17
+
18
+ [Breeze-7B-Instruct-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0.1) derives from the base model Breeze-7B-Base-v0.1
19
+ and has undergone supervised fine-tuning with over 1 million instances to
20
+ sharpen its capabilities. This fine-tuned model demonstrates impressive performance in benchmarks for both English and Traditional Chinese, surpassing the results of
21
+ Taiwan-LLM-7B-v2.1-chat, Taiwan-LLM-13B-v2.0-chat and Qwen-7B-chat in Traditional Chinese assessments. It also excels in some benchmarks against Yi-6B-Chat.
22
+ In English evaluations, Breeze-7B-Instruct-v0.1 shows comparable results to Mistral-7B-Instruct-v0.1 on the MMLU and MT-Bench benchmarks. [See [Chat Model Performance](#chat-model-performance).]
23
+
24
+
25
+ [Breeze-7B-Instruct-64k-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-64k-v0.1) is an extension to Breeze-7B-Instruct-v0.1
26
+ to enable 64k
27
+ context length, which is equivalent to 88k Traditional Chinese characters. With minimal sacrifice in the performance of the regular benchmarks,
28
+ Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summarization on document-level inputs. [See [Long-context Performance](#long-context-performance).]
29
+
30
 
31
  *A project by the members (in alphabetical order): Chan-Jan Hsu 許湛然, Chang-Le Liu 劉昶樂, Feng-Ting Liao 廖峰挺, Po-Chun Hsu 許博竣, Yi-Chang Chen 陳宜昌, and the supervisor Da-Shan Shiu 許大山.*
32
 
33
  ## Features
34
 
35
+ - Breeze-7B-Base-v0.1
36
+ - Expanding the vocabulary dictionary size from 32k to 62k to better support Traditional Chinese
37
+ - 8k tokens context length
38
+ - Breeze-7B-Instruct-v0.1
39
+ - Expanding the vocabulary dictionary size from 32k to 62k to better support Traditional Chinese
40
+ - 8k tokens context length
41
+ - Multi-turn dialogue (without special handling for harmfulness)
42
+ - Breeze-7B-Instruct-64k-v0.1
43
+ - Expanding the vocabulary dictionary size from 32k to 62k to better support Traditional Chinese
44
+ - 64k tokens context length
45
+ - Multi-turn dialogue (without special handling for harmfulness)
46
 
47
  ## Model Details
48
+
49
+ - Breeze-7B-Base-v0.1
50
+ - Finetuned from: [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
51
+ - Model type: Causal decoder-only transformer language model
52
+ - Language: English and Traditional Chinese (zh-tw)
53
+ - Breeze-7B-Instruct-v0.1
54
+ - Finetuned from: [MediaTek-Research/Breeze-7B-Base-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v0.1)
55
+ - Model type: Causal decoder-only transformer language model
56
+ - Language: English and Traditional Chinese (zh-tw)
57
+ - Breeze-7B-Instruct-64k-v0.1
58
+ - Finetuned from: [MediaTek-Research/Breeze-7B-Instruct-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0.1)
59
+ - Model type: Causal decoder-only transformer language model
60
+ - Language: English and Traditional Chinese (zh-tw)
61
 
62
  ## Base Model Performance
63
 
 
85
  | Mistral-7B-v0.1 | 33.01 | 42.23 | 35.86 | 37.63 |
86
 
87
 
88
+ **TMMLU+**, **DRCD**, and **Table** source from [MediaTek-Research/TCEval-v2](https://huggingface.co/datasets/MediaTek-Research/TCEval-v2).
89
+ [MediaTek-Research/TCEval-v2](https://huggingface.co/datasets/MediaTek-Research/TCEval-v2) derives from [TCEval-v1](https://github.com/mtkresearch/MR-Models/tree/main/TC-Eval)
90
+ and [ikala/tmmluplus](https://huggingface.co/datasets/ikala/tmmluplus). **MMLU** sources from [hails/mmlu_no_train](https://huggingface.co/datasets/hails/mmlu_no_train).
91
+ We use the code revised from [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate **TMMLU+**, **DRCD**, **Table**, and **MMLU**.
92
+
93
+
94
  ## Chat Model Performance
95
 
96
  | Models | | TMMLU+ (ACC) | TMMLU+ (ACC) | DRCD (EM) | Table (ACC) | MT-Bench-tw (Score) | MMLU (ACC) | MMLU (ACC) | MT-Bench (Score) |
 
122
  | Taiwan-LLM-13B-v2.0-chat | 27.74 | 33.69 | 27.03 | 29.43 |
123
  | Taiwan-LLM-7B-v2.1-chat | 25.58 | 31.76 | 27.36 | 27.61 |
124
 
125
+ **TMMLU+**, **DRCD**, **Table**, and **MT-Bench-tw** source from [MediaTek-Research/TCEval-v2](https://huggingface.co/datasets/MediaTek-Research/TCEval-v2).
126
+ [MediaTek-Research/TCEval-v2](https://huggingface.co/datasets/MediaTek-Research/TCEval-v2) derives from [TCEval-v1](https://github.com/mtkresearch/MR-Models/tree/main/TC-Eval)
127
+ and [ikala/tmmluplus](https://huggingface.co/datasets/ikala/tmmluplus). **MMLU** sources from [hails/mmlu_no_train](https://huggingface.co/datasets/hails/mmlu_no_train).
128
+ **MT-Bench** source from [lmsys/mt_bench_human_judgments](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments).
129
+ We use the code revised from [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate **TMMLU+**, **DRCD**, **Table**, and **MMLU**.
130
+ We use the code revised from [fastchat llm_judge](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) to evaluate **MT-Bench-tw** and **MT-Bench**.
131
+
132
 
133
  ## Inference Performance
134
  In this test, we use the first 700 characters of the [web article](https://health.udn.com/health/story/5976/7699252?from=udn_ch1005_main_index) as the input and ask the model to write the same article again.
135
+ All inferences run on 2 RTX A6000 GPUs (using `vllm`, with a tensor-parallel size of 2).
136
 
137
+ | Models | Inference Time (sec)|Estimated Max Input Length (Char)|
138
  |--------------------------------------------------------------------|-------------------|--------------------------|
139
  | Yi-6B | 10.62 | 5.2k |
140
  | **Breeze-7B-Instruct-v0.1** | 10.74 | 11.1k |
 
146
  | Taiwan-LLM-13B-v2.0-base | 36.80 | 2.2k |
147
  | Yi-34B | 43.71 | 4.5k |
148
 
149
+ ## Long-context Performance
150
+
151
+ TBD
152
+
153
+ ## Examples
154
+
155
+ TBD
156
 
157
  ## Use in Transformers
158
 
159
+ First install direct dependencies:
160
  ```
161
+ pip install transformers torch accelerate
162
  ```
163
  If you want faster inference using flash-attention2, you need to install these dependencies:
164
  ```bash
 
171
  import torch
172
 
173
  model = AutoModelForCausalLM.from_pretrained(
174
+ model="MediaTek-Research/Breeze-7B-Instruct-v0.1",
175
  device_map="auto",
176
  torch_dtype=torch.bfloat16,
177
+ use_flash_attn_2=True # optional
178
  )
179
  ```
180
+
181
+ The structure of the query template follows that of Mistral-7B-Instruct, as shown below.
182
+ ```txt
183
+ <s> SYS_PROMPT [INST] QUERY1 [/INST] RESPONSE1 [INST] QUERY2 [/INST]
184
+ ```
185
+ where `SYS_PROMPT`, `QUERY1`, `RESPONSE1`, and `QUERY2` can be provided by the user.
186
+
187
+ The suggested default `SYS_PROMPT` is
188
+ ```txt
189
+ You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.
190
+ ```