Commit
·
eda8816
1
Parent(s):
5cb8c21
Update README.md
Browse files
README.md
CHANGED
@@ -30,17 +30,15 @@ pip install xformers
|
|
30 |
### Model Description
|
31 |
|
32 |
|
33 |
-
Silo-PD is a 1.3B parameter, decoder-only language model trained on public domain
|
34 |
|
35 |
-
|
36 |
|
37 |
The model is trained with 128 A100 GPUs across 16 nodes.
|
38 |
|
39 |
|
40 |
### Model and Training Hyperparameters
|
41 |
|
42 |
-
The following reports the hyperparameters for the parametric component of Silo-PD.
|
43 |
-
|
44 |
We follow the model architecture of LLaMa, and we use the GPT-NeoX-20B tokenizer, with 50432 BPE types.
|
45 |
|
46 |
During training, we use 2,048 token sequences that are packed across document boundaries, and we pre-pend a beginning-of-text token to every document.
|
|
|
30 |
### Model Description
|
31 |
|
32 |
|
33 |
+
Silo-PD is a 1.3B parameter, decoder-only language model trained on data in the public domain from [the Open License Corpus (OLC)](https://huggingface.co/datasets/kernelmachine/open-license-corpus).
|
34 |
|
35 |
+
The model is based on the LLaMA architecture as implemented in (OpenLM)[].
|
36 |
|
37 |
The model is trained with 128 A100 GPUs across 16 nodes.
|
38 |
|
39 |
|
40 |
### Model and Training Hyperparameters
|
41 |
|
|
|
|
|
42 |
We follow the model architecture of LLaMa, and we use the GPT-NeoX-20B tokenizer, with 50432 BPE types.
|
43 |
|
44 |
During training, we use 2,048 token sequences that are packed across document boundaries, and we pre-pend a beginning-of-text token to every document.
|