Update README.md
Browse files
README.md
CHANGED
@@ -34,9 +34,9 @@ The released weights are still a work in progress and they might change in the f
|
|
34 |
The corpus was compiled by this process:
|
35 |
|
36 |
1. We gathered all openly available datasets: [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset), [OASST 1](https://huggingface.co/datasets/OpenAssistant/oasst1), [OASST 2](https://huggingface.co/datasets/OpenAssistant/oasst2), [OIG-small-chip2](https://huggingface.co/datasets/laion/OIG), [No Robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots), [Dolly](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and [Glaive code assistant](Glaive-code-assistant-v2).
|
37 |
-
2. These were first manually inspected and filtered, and then automatically filtered with [Mixtral-8x7B](mistralai/Mixtral-8x7B-Instruct-v0.1) to remove incorrect, offensive, non-English and American-centric responses.
|
38 |
-
3. The responses were augmented to be more descriptive by [Mixtral-8x7B](mistralai/Mixtral-8x7B-Instruct-v0.1).
|
39 |
-
4. Since most of that dataset contains only a single dialogue turn, we generated more turns using [Mixtral-8x7B](mistralai/Mixtral-8x7B-Instruct-v0.1).
|
40 |
5. Finally, we translated the resulting dataset into Bokmål and Nynorsk using [NorMistral-7b-warm](https://huggingface.co/norallm/normistral-7b-warm).
|
41 |
|
42 |
## About the base model
|
|
|
34 |
The corpus was compiled by this process:
|
35 |
|
36 |
1. We gathered all openly available datasets: [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset), [OASST 1](https://huggingface.co/datasets/OpenAssistant/oasst1), [OASST 2](https://huggingface.co/datasets/OpenAssistant/oasst2), [OIG-small-chip2](https://huggingface.co/datasets/laion/OIG), [No Robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots), [Dolly](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and [Glaive code assistant](Glaive-code-assistant-v2).
|
37 |
+
2. These were first manually inspected and filtered, and then automatically filtered with [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) to remove incorrect, offensive, non-English and American-centric responses.
|
38 |
+
3. The responses were augmented to be more descriptive by [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
|
39 |
+
4. Since most of that dataset contains only a single dialogue turn, we generated more turns using [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
|
40 |
5. Finally, we translated the resulting dataset into Bokmål and Nynorsk using [NorMistral-7b-warm](https://huggingface.co/norallm/normistral-7b-warm).
|
41 |
|
42 |
## About the base model
|