Update README.md
Browse files
README.md
CHANGED
@@ -116,7 +116,7 @@ The following datasets were used for continual pre-training.
|
|
116 |
- [English Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
|
117 |
- [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
|
118 |
- [Laboro ParaCorpus](https://github.com/laboroai/Laboro-ParaCorpus)
|
119 |
-
- [Swallow Corpus Version 2](https://arxiv.org/abs/2404.17733)
|
120 |
- [The-stack-v2(filtered)]()
|
121 |
|
122 |
### Swallow Corpus Version 2
|
|
|
116 |
- [English Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
|
117 |
- [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
|
118 |
- [Laboro ParaCorpus](https://github.com/laboroai/Laboro-ParaCorpus)
|
119 |
+
- [Swallow Corpus Version 2](https://arxiv.org/abs/2404.17733) (filtered using [Swallow Education Classifier](https://huggingface.co/tokyotech-llm/edu-classifier))
|
120 |
- [The-stack-v2(filtered)]()
|
121 |
|
122 |
### Swallow Corpus Version 2
|