amezasor commited on
Commit
5d0717b
1 Parent(s): ba94200

update after revision

Browse files
Files changed (1) hide show
  1. README.md +20 -24
README.md CHANGED
@@ -2,9 +2,6 @@
2
  pipeline_tag: text-generation
3
  inference: false
4
  license: apache-2.0
5
- # datasets:
6
- # metrics:
7
- # - code_eval
8
  library_name: transformers
9
  tags:
10
  - language
@@ -205,28 +202,28 @@ model-index:
205
  ---
206
 
207
  <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
 
208
 
209
  # Granite-3.0-2B-Base
210
 
211
- ## Model Summary
212
- **Granite-3.0-2B-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-2B-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 10 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
213
 
214
  - **Developers:** IBM Research
215
  - **GitHub Repository:** [ibm-granite/granite-3.0-language-models](https://github.com/ibm-granite/granite-3.0-language-models)
216
  - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
217
- - **Paper:** [Granite 3.0 Language Models]()
218
  - **Release Date**: October 21st, 2024
219
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
220
 
221
- ## Supported Languages
222
- English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese
223
 
224
- ## Usage
225
- ### Intended use
226
  Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios.
227
 
228
- ### Generation
229
- This is a simple example of how to use **Granite-3.0-2B-Base** model.
230
 
231
  Install the following libraries:
232
 
@@ -258,8 +255,8 @@ output = tokenizer.batch_decode(output)
258
  print(output)
259
  ```
260
 
261
- ## Model Architeture
262
- **Granite-3.0-2B-Base** is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embbeddings.
263
 
264
  | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE |
265
  | :-------- | :-------- | :--------| :--------| :--------|
@@ -279,19 +276,18 @@ print(output)
279
  | # Active Parameters | **2.5B** | 8.1B | 400M | 800M |
280
  | # Training tokens | **12T** | 12T | 10T | 10T |
281
 
282
- <!-- TO DO: To be completed once the paper is ready -->
283
- ## Training Data
284
- This model is trained on a mix of open-source and proprietary data following a two-phase training strategy.
285
- * Phase 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
286
- * Phase 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
287
 
288
- ## Infrastructure
289
- We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
290
 
291
- ## Ethical Considerations and Limitations
292
  The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. **Granite-3.0-2B-Base** model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use **Granite-3.0-2B-Base** model with ethical intentions and in a responsible way.
293
 
294
- ## Citation
295
  ```
296
  @misc{granite-models,
297
  author = {author 1, author2, ...},
@@ -301,4 +297,4 @@ The use of Large Language Models involves risks and ethical considerations peopl
301
  year = {2024},
302
  url = {https://arxiv.org/abs/0000.00000},
303
  }
304
- ```
 
2
  pipeline_tag: text-generation
3
  inference: false
4
  license: apache-2.0
 
 
 
5
  library_name: transformers
6
  tags:
7
  - language
 
202
  ---
203
 
204
  <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
205
+ ![image/png](granite-3_0-language-models_Group_1.png)
206
 
207
  # Granite-3.0-2B-Base
208
 
209
+ **Model Summary:**
210
+ Granite-3.0-2B-Base is a decoder-only language model to support a variety of text-to-text generation tasks. It is trained from scratch following a two-stage training strategy. In the first stage, it is trained on 10 trillion tokens sourced from diverse domains. During the second stage, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
211
 
212
  - **Developers:** IBM Research
213
  - **GitHub Repository:** [ibm-granite/granite-3.0-language-models](https://github.com/ibm-granite/granite-3.0-language-models)
214
  - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
215
+ - **Paper:** [Granite 3.0 Language Models](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/granite-3-language-models.pdf)
216
  - **Release Date**: October 21st, 2024
217
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
218
 
219
+ **Supported Languages:**
220
+ English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may fintune Granite 3.0 models for languages beyond these 12 languages.
221
 
222
+ **Intended use:**
 
223
  Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios.
224
 
225
+ **Generation:**
226
+ This is a simple example of how to use Granite-3.0-2B-Base model.
227
 
228
  Install the following libraries:
229
 
 
255
  print(output)
256
  ```
257
 
258
+ **Model Architeture:**
259
+ Granite-3.0-2B-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embbeddings.
260
 
261
  | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE |
262
  | :-------- | :-------- | :--------| :--------| :--------|
 
276
  | # Active Parameters | **2.5B** | 8.1B | 400M | 800M |
277
  | # Training tokens | **12T** | 12T | 10T | 10T |
278
 
279
+ **Training Data:**
280
+ This model is trained on a mix of open source and proprietary data following a two-phase training strategy.
281
+ * Stage 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
282
+ * Stage 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
 
283
 
284
+ **Infrastructure:**
285
+ We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
286
 
287
+ **Ethical Considerations and Limitations:**
288
  The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. **Granite-3.0-2B-Base** model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use **Granite-3.0-2B-Base** model with ethical intentions and in a responsible way.
289
 
290
+ <!-- ## Citation
291
  ```
292
  @misc{granite-models,
293
  author = {author 1, author2, ...},
 
297
  year = {2024},
298
  url = {https://arxiv.org/abs/0000.00000},
299
  }
300
+ ``` -->