inceptionai
/

Jais-family-256m

Text Generation

Model card Files Files and versions Community

grandiose-pizza commited on Aug 29

Commit

53c6e82

•

1 Parent(s): 9dd3950

Update README.md

Files changed (1) hide show

README.md +1 -15

README.md CHANGED Viewed

@@ -72,7 +72,7 @@ Below is sample code to use the model. Note that the model requires a custom mod
 import torch
 from transformers import AutoTokenizer, AutoModelForCausalLM
-model_path = "inceptionai/jais-family-30b-16k"
 device = "cuda" if torch.cuda.is_available() else "cpu"
@@ -157,20 +157,6 @@ During the adapted pre-training of the (`jais-adapted-*`) models, we first initi
 During instruction tuning, each training example consists of a single-turn or multi-turn prompt and it's response. Instead of one example per sequence, examples are packed together while the loss is masked on the prompt tokens. This approach speeds up training by allowing more examples to be processed per batch.
-### Training Hyperparameters:
-#### Jais-family-30b-16k
-| Hyperparameter | Value                                     |
-|----------------|-------------------------------------------|
-| Precision      | fp32                                      |
-| Optimizer      | AdamW                                     |
-| Learning rate  | 0 to 0.012(<=69 warmup steps)<br>0.012 to 0.00231(>69 and <=137273 steps)<br>0.00231 to 0.00048(>137273 and <= 260648 steps)<br>0.00048 to 0.000048(>260648 and <=287032 steps)|
-| Weight decay   | 0.1                                       |
-| Batch size     | 2664(<=137273 steps)<br>748(>137273 and <= 260648 steps)<br>384(>260648 and <=287032 steps)|
-| Context Length | 2048(<=137273 steps)<br>8192(>137273 and <= 260648 steps)<br>16384(>260648 and <=287032 steps)|
-| Steps          | 287032                                    |
 ### Compute Infrastructure
 The training process was performed on the Condor Galaxy (CG) supercomputer platform. A CG contains 64 Cerebras CS-2 Wafer-Scale Engines (WSE-2) with 40 GB of SRAM, and achieves a total of 960 PetaFLOP/s.

 import torch
 from transformers import AutoTokenizer, AutoModelForCausalLM
+model_path = "inceptionai/Jais-family-256m"
 device = "cuda" if torch.cuda.is_available() else "cpu"
 During instruction tuning, each training example consists of a single-turn or multi-turn prompt and it's response. Instead of one example per sequence, examples are packed together while the loss is masked on the prompt tokens. This approach speeds up training by allowing more examples to be processed per batch.
 ### Compute Infrastructure
 The training process was performed on the Condor Galaxy (CG) supercomputer platform. A CG contains 64 Cerebras CS-2 Wafer-Scale Engines (WSE-2) with 40 GB of SRAM, and achieves a total of 960 PetaFLOP/s.