MaximumEntropy
commited on
Commit
•
a5f6bad
1
Parent(s):
e428a5b
Update README.md
Browse files
README.md
CHANGED
@@ -85,7 +85,7 @@ This model was trained on 1.1T tokens with [NeMo](https://docs.nvidia.com/deeple
|
|
85 |
- Maximum sequence length of 4,096 compared to 2,048 in https://huggingface.co/nvidia/nemo-megatron-gpt-20B.
|
86 |
- No dropout.
|
87 |
- No bias terms in all linear layers.
|
88 |
-
-
|
89 |
|
90 |
## Getting started
|
91 |
|
|
|
85 |
- Maximum sequence length of 4,096 compared to 2,048 in https://huggingface.co/nvidia/nemo-megatron-gpt-20B.
|
86 |
- No dropout.
|
87 |
- No bias terms in all linear layers.
|
88 |
+
- Untied embedding and output layers.
|
89 |
|
90 |
## Getting started
|
91 |
|