Update README.md
Browse files
README.md
CHANGED
@@ -7,4 +7,11 @@ sdk: streamlit
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
-
# Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies. Available from: https://arxiv.org/abs/2410.22886
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
+
# Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies. Available from: https://arxiv.org/abs/2410.22886.
|
11 |
+
|
12 |
+
Salhan et al (2024) creates age-ordered corpora of Child-Directed Speech for four typologically distant language families to implement SSLMs and acquisition-inspired curricula cross-lingually.
|
13 |
+
|
14 |
+
The MAO-CHILDES dataset contains extract orthographic datasets for French, German, Japanese and Chinese and several other lower-resource languages. It is part of a wider effort for cognitively-inspired pretraining using resources from Language Acquistiion.
|
15 |
+
|
16 |
+
You can also find pretrained BabyLMs for French, German, Japanese and Chinese with three different cognitively-inspired curriculum learning in the branches of each language-specific BabyLM.
|
17 |
+
|