amezasor commited on
Commit
2f5a91c
·
verified ·
1 Parent(s): d83016d

instruct model - initial commit

Browse files
Files changed (1) hide show
  1. README.md +316 -3
README.md CHANGED
@@ -1,3 +1,316 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ inference: false
4
+ license: apache-2.0
5
+ # datasets:
6
+ # metrics:
7
+ # - code_eval
8
+ library_name: transformers
9
+ tags:
10
+ - language
11
+ - granite-3.0
12
+ model-index:
13
+ - name: granite-3.0-1b-a400m-instruct
14
+ results:
15
+ - task:
16
+ type: text-generation
17
+ dataset:
18
+ type: human-exams
19
+ name: MMLU
20
+ metrics:
21
+ - name: pass@1
22
+ type: pass@1
23
+ value:
24
+ veriefied: false
25
+ - task:
26
+ type: text-generation
27
+ dataset:
28
+ type: human-exams
29
+ name: MMLU-Pro
30
+ metrics:
31
+ - name: pass@1
32
+ type: pass@1
33
+ value:
34
+ veriefied: false
35
+ - task:
36
+ type: text-generation
37
+ dataset:
38
+ type: human-exams
39
+ name: AGI-Eval
40
+ metrics:
41
+ - name: pass@1
42
+ type: pass@1
43
+ value:
44
+ veriefied: false
45
+ - task:
46
+ type: text-generation
47
+ dataset:
48
+ type: commonsense
49
+ name: WinoGrande
50
+ metrics:
51
+ - name: pass@1
52
+ type: pass@1
53
+ value:
54
+ veriefied: false
55
+ - task:
56
+ type: text-generation
57
+ dataset:
58
+ type: commonsense
59
+ name: OBQA
60
+ metrics:
61
+ - name: pass@1
62
+ type: pass@1
63
+ value:
64
+ veriefied: false
65
+ - task:
66
+ type: text-generation
67
+ dataset:
68
+ type: commonsense
69
+ name: SIQA
70
+ metrics:
71
+ - name: pass@1
72
+ type: pass@1
73
+ value:
74
+ veriefied: false
75
+ - task:
76
+ type: text-generation
77
+ dataset:
78
+ type: commonsense
79
+ name: PIQA
80
+ metrics:
81
+ - name: pass@1
82
+ type: pass@1
83
+ value:
84
+ veriefied: false
85
+ - task:
86
+ type: text-generation
87
+ dataset:
88
+ type: commonsense
89
+ name: Hellaswag
90
+ metrics:
91
+ - name: pass@1
92
+ type: pass@1
93
+ value:
94
+ veriefied: false
95
+ - task:
96
+ type: text-generation
97
+ dataset:
98
+ type: commonsense
99
+ name: TruthfulQA
100
+ metrics:
101
+ - name: pass@1
102
+ type: pass@1
103
+ value:
104
+ veriefied: false
105
+ - task:
106
+ type: text-generation
107
+ dataset:
108
+ type: reading-comprehension
109
+ name: BoolQ
110
+ metrics:
111
+ - name: pass@1
112
+ type: pass@1
113
+ value:
114
+ veriefied: false
115
+ - task:
116
+ type: text-generation
117
+ dataset:
118
+ type: reading-comprehension
119
+ name: SQuAD v2
120
+ metrics:
121
+ - name: pass@1
122
+ type: pass@1
123
+ value:
124
+ veriefied: false
125
+ - task:
126
+ type: text-generation
127
+ dataset:
128
+ type: reasoning
129
+ name: ARC-C
130
+ metrics:
131
+ - name: pass@1
132
+ type: pass@1
133
+ value:
134
+ veriefied: false
135
+ - task:
136
+ type: text-generation
137
+ dataset:
138
+ type: reasoning
139
+ name: GPQA
140
+ metrics:
141
+ - name: pass@1
142
+ type: pass@1
143
+ value:
144
+ veriefied: false
145
+ - task:
146
+ type: text-generation
147
+ dataset:
148
+ type: reasoning
149
+ name: BBH
150
+ metrics:
151
+ - name: pass@1
152
+ type: pass@1
153
+ value:
154
+ veriefied: false
155
+ - task:
156
+ type: text-generation
157
+ dataset:
158
+ type: code
159
+ name: HumanEval
160
+ metrics:
161
+ - name: pass@1
162
+ type: pass@1
163
+ value:
164
+ veriefied: false
165
+ - task:
166
+ type: text-generation
167
+ dataset:
168
+ type: code
169
+ name: MBPP
170
+ metrics:
171
+ - name: pass@1
172
+ type: pass@1
173
+ value:
174
+ veriefied: false
175
+ - task:
176
+ type: text-generation
177
+ dataset:
178
+ type: math
179
+ name: GSM8K
180
+ metrics:
181
+ - name: pass@1
182
+ type: pass@1
183
+ value:
184
+ veriefied: false
185
+ - task:
186
+ type: text-generation
187
+ dataset:
188
+ type: math
189
+ name: MATH
190
+ metrics:
191
+ - name: pass@1
192
+ type: pass@1
193
+ value:
194
+ veriefied: false
195
+ - task:
196
+ type: text-generation
197
+ dataset:
198
+ type: multilingual
199
+ name: MGSM
200
+ metrics:
201
+ - name: pass@1
202
+ type: pass@1
203
+ value:
204
+ veriefied: false
205
+ ---
206
+ <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
207
+
208
+ # Granite-3.0-1B-A400M-Instruct
209
+
210
+ ## Model Summary
211
+ **Granite-3.0-1B-A400M-Instruct** is a lightweight and open-source 1B parameter model fine tuned from *Granite-3.0-1B-A400M-Base* on a combination of open-source and proprietary instruction data with a **permissively licensed**. This language model is designed to excel in instruction following tasks such as summarization, problem-solving, text translation, reasoning, code tasks, funcion-calling, and more.
212
+
213
+ - **Developers:** IBM Research
214
+ - **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
215
+ - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
216
+ - **Paper:** [Granite Language Models](https://) <!-- TO DO: Update github repo link when it is ready -->
217
+ - **Release Date**: October 21st, 2024
218
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
219
+
220
+ ## Supported Languages
221
+ English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese (Simplified)
222
+
223
+ ## Usage
224
+ ### Intended use
225
+ The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including bussiness applications.
226
+
227
+ ### Capabilities
228
+ * Summarization
229
+ * Text classification
230
+ * Text extraction
231
+ * Question-answering
232
+ * Retrieval Augmented Generation (RAG)
233
+ * Code related
234
+ * Function-calling
235
+ * Multilingual dialog use cases
236
+
237
+ ### Generation
238
+ This is a simple example of how to use **Granite-3.0-1B-A400M-Instruct** model.
239
+
240
+ Install the following libraries:
241
+
242
+ ```shell
243
+ pip install torch torchvision torchaudio
244
+ pip install accelerate
245
+ pip install transformers
246
+ ```
247
+ Then, copy the snippet from the section that is relevant for your usecase.
248
+
249
+ ```python
250
+ import torch
251
+ from transformers import AutoModelForCausalLM, AutoTokenizer
252
+
253
+ device = "auto"
254
+ model_path = "ibm-granite/granite-3.0-1b-a400m-instruct"
255
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
256
+ # drop device_map if running on CPU
257
+ model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
258
+ model.eval()
259
+ # change input text as desired
260
+ chat = [
261
+ { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
262
+ ]
263
+ chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
264
+ # tokenize the text
265
+ input_tokens = tokenizer(chat, return_tensors="pt").to(device)
266
+ # generate output tokens
267
+ output = model.generate(**input_tokens,
268
+ max_new_tokens=100)
269
+ # decode output tokens into text
270
+ output = tokenizer.batch_decode(output)
271
+ # print output
272
+ print(output)
273
+ ```
274
+
275
+ <!-- TO DO: function-calling-example
276
+ -->
277
+
278
+ <!-- ['<|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>1. IBM Research - Almaden, San Jose, California<|end_of_text|>'] -->
279
+
280
+ ## Model Architeture
281
+ **Granite-3.0-1B-A400M-Instruct** is based on a decoder-only sparse Mixture of Experts(MoE) transformer architecture. Core components of this architecture are: Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss.
282
+
283
+ | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE |
284
+ | :-------- | :--------| :--------| :-------- |:-------- |
285
+ | Embedding size | 2048 | 4096 | **1024** | 1536 |
286
+ | Number of layers | 40 | 40 | **24** | 32 |
287
+ | Attention head size | 64 | 128 | **64** | 64 |
288
+ | Number of attention heads | 32 | 32 | **16** | 24 |
289
+ | Number of KV heads | 8 | 8 | **8** | 8 |
290
+ | MLP hidden size | 8192 | 12800 | **512** | 512 |
291
+ | MLP activation | SwiGLU | SwiGLU | **SwiGLU** | SwiGLU |
292
+ | Number of Experts | — | — | **32** | 40 |
293
+ | MoE TopK | — | — | **8** | 8 |
294
+ | Initialization std | 0.1 | 0.1 | **0.1** | 0.1 |
295
+ | Sequence Length | 4096 | 4096 | **4096** | 4096 |
296
+ | Position Embedding | RoPE | RoPE | **RoPE** | RoPE |
297
+ | # Paremeters | 2.5B | 8.1B | **1.3B** | 3.3B |
298
+ | # Active Parameters | 2.5B | 8.1B | **400M** | 800M |
299
+ | # Training tokens | 12T | 12T | **10T** | 10T |
300
+
301
+ <!-- TO DO: To be completed once the paper is ready, we may changed title to Supervised Finetuning -->
302
+ ## Training Data
303
+ This model is trained on a mix of open-source and proprietary datasets.
304
+ <!-- ### Instruction Datasets
305
+ * Language Instruction Datasets: We include high-quality datasets such as [TO DO: List of datasets]
306
+ * Synthetic Instruction Datasets: [TO DO: paragraph about synthetic data]
307
+ ### Processing
308
+ * [TO DO: Data annotation with MagPie pipeline: quality, duplicates] -->
309
+
310
+ ## Infrastructure
311
+ We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
312
+
313
+ <!-- TO DO: Check multilingual statement once the paper is ready -->
314
+ ## Ethical Considerations and Limitations
315
+ Granite instruct models are primarily finetuned using instruction-response pairs mostly in English, but also in German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese (Simplified). As this model has been exposed to multilingual data, it can handle multilingual dialog use cases with a limited performance in non-English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. The model also inherits ethical considerations and limitations from its base model. For more information, please refer to *[Granite-3.0-1B-A400M-Base](https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-base)* model card.
316
+