Update README.md
Browse files
README.md
CHANGED
@@ -76,7 +76,9 @@ The Retrieva BERT model was pre-trained on the reunion of five datasets:
|
|
76 |
- Chinese Wikipedia dumped on 20240120.
|
77 |
- Korean Wikipedia dumped on 20240120.
|
78 |
- [The Stack](https://huggingface.co/datasets/bigcode/the-stack)
|
|
|
79 |
The model was trained on 180 billion tokens using the above dataset.
|
|
|
80 |
### Training Procedure
|
81 |
The model was trained on 4 to 32 H100 GPUs with a batch size of 1,024.
|
82 |
We adopted the curriculum learning which is similar to the Sequence Length Warmup and training with the following sequence lengths and number of steps.
|
|
|
76 |
- Chinese Wikipedia dumped on 20240120.
|
77 |
- Korean Wikipedia dumped on 20240120.
|
78 |
- [The Stack](https://huggingface.co/datasets/bigcode/the-stack)
|
79 |
+
|
80 |
The model was trained on 180 billion tokens using the above dataset.
|
81 |
+
|
82 |
### Training Procedure
|
83 |
The model was trained on 4 to 32 H100 GPUs with a batch size of 1,024.
|
84 |
We adopted the curriculum learning which is similar to the Sequence Length Warmup and training with the following sequence lengths and number of steps.
|