grandiose-pizza commited on
Commit
53c6e82
1 Parent(s): 9dd3950

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -15
README.md CHANGED
@@ -72,7 +72,7 @@ Below is sample code to use the model. Note that the model requires a custom mod
72
  import torch
73
  from transformers import AutoTokenizer, AutoModelForCausalLM
74
 
75
- model_path = "inceptionai/jais-family-30b-16k"
76
 
77
  device = "cuda" if torch.cuda.is_available() else "cpu"
78
 
@@ -157,20 +157,6 @@ During the adapted pre-training of the (`jais-adapted-*`) models, we first initi
157
 
158
  During instruction tuning, each training example consists of a single-turn or multi-turn prompt and it's response. Instead of one example per sequence, examples are packed together while the loss is masked on the prompt tokens. This approach speeds up training by allowing more examples to be processed per batch.
159
 
160
-
161
- ### Training Hyperparameters:
162
-
163
- #### Jais-family-30b-16k
164
- | Hyperparameter | Value |
165
- |----------------|-------------------------------------------|
166
- | Precision | fp32 |
167
- | Optimizer | AdamW |
168
- | Learning rate | 0 to 0.012(<=69 warmup steps)<br>0.012 to 0.00231(>69 and <=137273 steps)<br>0.00231 to 0.00048(>137273 and <= 260648 steps)<br>0.00048 to 0.000048(>260648 and <=287032 steps)|
169
- | Weight decay | 0.1 |
170
- | Batch size | 2664(<=137273 steps)<br>748(>137273 and <= 260648 steps)<br>384(>260648 and <=287032 steps)|
171
- | Context Length | 2048(<=137273 steps)<br>8192(>137273 and <= 260648 steps)<br>16384(>260648 and <=287032 steps)|
172
- | Steps | 287032 |
173
-
174
  ### Compute Infrastructure
175
 
176
  The training process was performed on the Condor Galaxy (CG) supercomputer platform. A CG contains 64 Cerebras CS-2 Wafer-Scale Engines (WSE-2) with 40 GB of SRAM, and achieves a total of 960 PetaFLOP/s.
 
72
  import torch
73
  from transformers import AutoTokenizer, AutoModelForCausalLM
74
 
75
+ model_path = "inceptionai/Jais-family-256m"
76
 
77
  device = "cuda" if torch.cuda.is_available() else "cpu"
78
 
 
157
 
158
  During instruction tuning, each training example consists of a single-turn or multi-turn prompt and it's response. Instead of one example per sequence, examples are packed together while the loss is masked on the prompt tokens. This approach speeds up training by allowing more examples to be processed per batch.
159
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
160
  ### Compute Infrastructure
161
 
162
  The training process was performed on the Condor Galaxy (CG) supercomputer platform. A CG contains 64 Cerebras CS-2 Wafer-Scale Engines (WSE-2) with 40 GB of SRAM, and achieves a total of 960 PetaFLOP/s.