YikangS commited on
Commit
bddcb61
·
1 Parent(s): 3202672

update readme

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -5,7 +5,7 @@ license: apache-2.0
5
  MoLM is a collection of MoE-based language models ranging in scale from 4 billion to 8 billion parameters. This is the repository for the 4B pretrained model, converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom.
6
 
7
  **Model Usage**
8
- To load the model, you need install the (ModuleFormer package)[github.com/IBM/ModuleFormer]. Then you can load the model with the following code:
9
  ```
10
  from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig, AutoModelForSequenceClassification
11
  from moduleformer import ModuleFormerForCausalLM, ModuleFormerConfig, ModuleFormerForSequenceClassification
@@ -19,7 +19,7 @@ model = AutoModelForCausalLM.from_pretrained('ibm/MoLM-350M-4B')
19
 
20
  **Model Details**
21
  MoLM-350M-4B is a MoE-based language models. It has 4 billion parameters, but each input token will only use 350M parameteres during its inference. Thus, it's computationally equivelant to a 350M dense model.
22
- MoLM-700M-8B is a MoE-based language models. It has 8 billion parameters and computationally equivelant to a 700M dense model.
23
  Both models are trained on 300 billion tokens from publicly available sources, with a learning rate of 3.0 x 10<sup>-4</sup> and a global batch-size of 3M tokens.
24
 
25
  **Model Developers** IBM
 
5
  MoLM is a collection of MoE-based language models ranging in scale from 4 billion to 8 billion parameters. This is the repository for the 4B pretrained model, converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom.
6
 
7
  **Model Usage**
8
+ To load the model, you need install the [ModuleFormer package](github.com/IBM/ModuleFormer). Then you can load the model with the following code:
9
  ```
10
  from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig, AutoModelForSequenceClassification
11
  from moduleformer import ModuleFormerForCausalLM, ModuleFormerConfig, ModuleFormerForSequenceClassification
 
19
 
20
  **Model Details**
21
  MoLM-350M-4B is a MoE-based language models. It has 4 billion parameters, but each input token will only use 350M parameteres during its inference. Thus, it's computationally equivelant to a 350M dense model.
22
+ MoLM-700M-8B has 8 billion parameters and computationally equivelant to a 700M dense model.
23
  Both models are trained on 300 billion tokens from publicly available sources, with a learning rate of 3.0 x 10<sup>-4</sup> and a global batch-size of 3M tokens.
24
 
25
  **Model Developers** IBM