Update README.md
Browse files
README.md
CHANGED
@@ -28,9 +28,7 @@ This model is a fine-tuned version of TODO on [ReBatch/ultrafeedback_nl](https:/
|
|
28 |
|
29 |
## Model description
|
30 |
|
31 |
-
This model is a Dutch chat model, originally developed from Mistral 7B v0.3 Instruct and further
|
32 |
-
|
33 |
-
|
34 |
## Intended uses & limitations
|
35 |
|
36 |
This model could still generate wrong, misleading, and potentially even offensive content. Use at your own risk.
|
@@ -39,11 +37,11 @@ Use with Mistral's chat template (can be found in the tokenizer).
|
|
39 |
## Training procedure
|
40 |
|
41 |
|
42 |
-
This model was trained with QLoRa in bfloat16 with
|
43 |
|
44 |
## Evaluation results
|
45 |
|
46 |
-
The model was evaluated using [scandeval](https://scandeval.com/dutch-nlg/). There are improvements in 4
|
47 |
|
48 |
| Model| conll_nl | dutch_social | scala_nl | squad_nl | wiki_lingua_nl | mmlu_nl | hellaswag_nl |
|
49 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:
|
@@ -77,5 +75,5 @@ The following hyperparameters were used during training:
|
|
77 |
|
78 |
## Model Developer
|
79 |
|
80 |
-
The Mistral-7B-v0.3-Instruct model this model is based
|
81 |
-
The finetuning was done by [Julien Van den Avenne](https://huggingface.co/vandeju)
|
|
|
28 |
|
29 |
## Model description
|
30 |
|
31 |
+
This model is a Dutch chat model, originally developed from Mistral 7B v0.3 Instruct and further fine-tuned with QLoRA. It was first fine-tuned with SFT on a chat dataset and then with DPO on a feedback chat dataset.
|
|
|
|
|
32 |
## Intended uses & limitations
|
33 |
|
34 |
This model could still generate wrong, misleading, and potentially even offensive content. Use at your own risk.
|
|
|
37 |
## Training procedure
|
38 |
|
39 |
|
40 |
+
This model was trained with QLoRa in bfloat16 with Flash Attention 2 on one A100 PCIe, using the DPO script from the [alignment handbook](https://github.com/huggingface/alignment-handbook/) on [RunPod](https://www.runpod.io/).
|
41 |
|
42 |
## Evaluation results
|
43 |
|
44 |
+
The model was evaluated using [scandeval](https://scandeval.com/dutch-nlg/). There are improvements in 4 out of 7 benchmarks compared to the Mistral-7B-v0.3-Instruct model on which it is based.
|
45 |
|
46 |
| Model| conll_nl | dutch_social | scala_nl | squad_nl | wiki_lingua_nl | mmlu_nl | hellaswag_nl |
|
47 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:
|
|
|
75 |
|
76 |
## Model Developer
|
77 |
|
78 |
+
The Mistral-7B-v0.3-Instruct model, on which this model is based, was created by [Mistral AI](https://huggingface.co/mistralai).
|
79 |
+
The finetuning was done by [Julien Van den Avenne](https://huggingface.co/vandeju).
|