Cross-reference datasets
Browse filesThis improves discoverability and increases transparency.
To get started, I added the two largest ones (in terms of tokens contributed to training) as mentioned in the model card, feel free to be more exhaustive :)
README.md
CHANGED
@@ -39,6 +39,9 @@ language:
|
|
39 |
- sr
|
40 |
- sv
|
41 |
- uk
|
|
|
|
|
|
|
42 |
---
|
43 |
|
44 |
![](./images/logo_alia_2.png)
|
|
|
39 |
- sr
|
40 |
- sv
|
41 |
- uk
|
42 |
+
datasets:
|
43 |
+
- oscar-corpus/colossal-oscar-1.0
|
44 |
+
- HuggingFaceFW/fineweb-edu
|
45 |
---
|
46 |
|
47 |
![](./images/logo_alia_2.png)
|