norallm
/

normistral-7b-warm-instruct

Text Generation

Norwegian Bokmål

Norwegian Nynorsk

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

davda54 commited on Apr 12, 2024

Commit

d5c0069

·

verified ·

1 Parent(s): f230207

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -34,9 +34,9 @@ The released weights are still a work in progress and they might change in the f
 The corpus was compiled by this process:
 1. We gathered all openly available datasets: [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset), [OASST 1](https://huggingface.co/datasets/OpenAssistant/oasst1), [OASST 2](https://huggingface.co/datasets/OpenAssistant/oasst2), [OIG-small-chip2](https://huggingface.co/datasets/laion/OIG), [No Robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots), [Dolly](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and [Glaive code assistant](Glaive-code-assistant-v2).
-2. These were first manually inspected and filtered, and then automatically filtered with [Mixtral-8x7B](mistralai/Mixtral-8x7B-Instruct-v0.1) to remove incorrect, offensive, non-English and American-centric responses.
-3. The responses were augmented to be more descriptive by [Mixtral-8x7B](mistralai/Mixtral-8x7B-Instruct-v0.1).
-4. Since most of that dataset contains only a single dialogue turn, we generated more turns using [Mixtral-8x7B](mistralai/Mixtral-8x7B-Instruct-v0.1).
 5. Finally, we translated the resulting dataset into Bokmål and Nynorsk using [NorMistral-7b-warm](https://huggingface.co/norallm/normistral-7b-warm).
 ## About the base model

 The corpus was compiled by this process:
 1. We gathered all openly available datasets: [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset), [OASST 1](https://huggingface.co/datasets/OpenAssistant/oasst1), [OASST 2](https://huggingface.co/datasets/OpenAssistant/oasst2), [OIG-small-chip2](https://huggingface.co/datasets/laion/OIG), [No Robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots), [Dolly](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and [Glaive code assistant](Glaive-code-assistant-v2).
+2. These were first manually inspected and filtered, and then automatically filtered with [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) to remove incorrect, offensive, non-English and American-centric responses.
+3. The responses were augmented to be more descriptive by [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
+4. Since most of that dataset contains only a single dialogue turn, we generated more turns using [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
 5. Finally, we translated the resulting dataset into Bokmål and Nynorsk using [NorMistral-7b-warm](https://huggingface.co/norallm/normistral-7b-warm).
 ## About the base model