suchirsalhan commited on
Commit
4cddef6
·
verified ·
1 Parent(s): 62550cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -1
README.md CHANGED
@@ -7,4 +7,11 @@ sdk: streamlit
7
  pinned: false
8
  ---
9
 
10
- # Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies. Available from: https://arxiv.org/abs/2410.22886
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies. Available from: https://arxiv.org/abs/2410.22886.
11
+
12
+ Salhan et al (2024) creates age-ordered corpora of Child-Directed Speech for four typologically distant language families to implement SSLMs and acquisition-inspired curricula cross-lingually.
13
+
14
+ The MAO-CHILDES dataset contains extract orthographic datasets for French, German, Japanese and Chinese and several other lower-resource languages. It is part of a wider effort for cognitively-inspired pretraining using resources from Language Acquistiion.
15
+
16
+ You can also find pretrained BabyLMs for French, German, Japanese and Chinese with three different cognitively-inspired curriculum learning in the branches of each language-specific BabyLM.
17
+