Update README.md
Browse files
README.md
CHANGED
@@ -113,11 +113,11 @@ The models have been pre-trained using a blend of the following datasets.
|
|
113 |
|
114 |
| Language | Dataset | Tokens|
|
115 |
|:---:|:---:|:---:|
|
116 |
-
|Japanese|[Wikipedia](https://huggingface.co/datasets/wikipedia)|1.
|
117 |
-
||[
|
118 |
-
|English|[Wikipedia](https://huggingface.co/datasets/wikipedia)|
|
119 |
-
||[The Pile](https://huggingface.co/datasets/EleutherAI/pile)|
|
120 |
-
|Codes|[The Stack](https://huggingface.co/datasets/bigcode/the-stack)|
|
121 |
|
122 |
### Instruction tuning (To be updated)
|
123 |
|
|
|
113 |
|
114 |
| Language | Dataset | Tokens|
|
115 |
|:---:|:---:|:---:|
|
116 |
+
|Japanese|[Wikipedia](https://huggingface.co/datasets/wikipedia)|1.4B
|
117 |
+
||[Common Crawl](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus)|130.7B
|
118 |
+
|English|[Wikipedia](https://huggingface.co/datasets/wikipedia)|4.7B
|
119 |
+
||[The Pile](https://huggingface.co/datasets/EleutherAI/pile)|110.3B
|
120 |
+
|Codes|[The Stack](https://huggingface.co/datasets/bigcode/the-stack)|8.7B
|
121 |
|
122 |
### Instruction tuning (To be updated)
|
123 |
|