Cross-reference datasets

This improves discoverability and increases transparency.

To get started, I added the two largest ones (in terms of tokens contributed to training) as mentioned in the model card, feel free to be more exhaustive :)

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -39,6 +39,9 @@ language:
 - sr
 - sv
 - uk
 ---
 ![](./images/logo_alia_2.png)

 - sr
 - sv
 - uk
+datasets:
+- oscar-corpus/colossal-oscar-1.0
+- HuggingFaceFW/fineweb-edu
 ---
 ![](./images/logo_alia_2.png)