Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ Note that the model was not safety aligned and might generate problematic output
|
|
20 |
This is the first release of an ongoing open research project for multilingual language models.
|
21 |
If you want to train a model for your own language or are working on evaluations, please contact us or join our [Discord server](https://discord.gg/wUpvYs4XvM). **We are open for collaborations!**
|
22 |
|
23 |
-
*Special thanks go to **Disco Research** and **Björn Plüster** for sharing the German dataset with us*
|
24 |
|
25 |
### Model details
|
26 |
|
@@ -57,13 +57,13 @@ set a seed for reproducibility:
|
|
57 |
|
58 |
## Dataset
|
59 |
|
60 |
-
The training data was split evenly amongst the 5 languages based on the total number of tokens. We would like to thank Disco Research and Björn Plüster for making their dataset available to us.
|
61 |
|
62 |
**English and Code**
|
63 |
- [Open-Hermes-2B](https://huggingface.co/datasets/teknium/OpenHermes-2.5)
|
64 |
|
65 |
**German**
|
66 |
-
- [DiscoLM German Dataset](https://huggingface.co/DiscoResearch)
|
67 |
- [OASST-2](https://huggingface.co/datasets/OpenAssistant/oasst2) (German subset)
|
68 |
- [Aya-Dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset) (German subset)
|
69 |
|
|
|
20 |
This is the first release of an ongoing open research project for multilingual language models.
|
21 |
If you want to train a model for your own language or are working on evaluations, please contact us or join our [Discord server](https://discord.gg/wUpvYs4XvM). **We are open for collaborations!**
|
22 |
|
23 |
+
*Special thanks go to **[Disco Research](https://huggingface.co/DiscoResearch)**, **[Jan Philipp Harries](https://huggingface.co/jphme)**, and **[Björn Plüster](https://huggingface.co/bjoernp)** for sharing the German dataset with us*
|
24 |
|
25 |
### Model details
|
26 |
|
|
|
57 |
|
58 |
## Dataset
|
59 |
|
60 |
+
The training data was split evenly amongst the 5 languages based on the total number of tokens. We would like to thank [Disco Research](https://huggingface.co/DiscoResearch), [Jan Philipp Harries](https://huggingface.co/jphme), and [Björn Plüster](https://huggingface.co/bjoernp) for making their dataset available to us.
|
61 |
|
62 |
**English and Code**
|
63 |
- [Open-Hermes-2B](https://huggingface.co/datasets/teknium/OpenHermes-2.5)
|
64 |
|
65 |
**German**
|
66 |
+
- [DiscoLM German Dataset](https://huggingface.co/DiscoResearch) includes the publicly available [germanrag](https://huggingface.co/datasets/DiscoResearch/germanrag) dataset
|
67 |
- [OASST-2](https://huggingface.co/datasets/OpenAssistant/oasst2) (German subset)
|
68 |
- [Aya-Dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset) (German subset)
|
69 |
|