File size: 2,529 Bytes
d674c5d bf34c4d d674c5d bf34c4d d674c5d bf34c4d d674c5d bf34c4d d674c5d bf34c4d d674c5d bf34c4d d674c5d bf34c4d d674c5d bf34c4d d674c5d bf34c4d d674c5d bf34c4d d674c5d bf34c4d d674c5d bf34c4d d674c5d bf34c4d d674c5d bf34c4d d674c5d bf34c4d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
---
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- dpo
---
<p align="center" style="margin:0;padding:0">
<img src="8.PNG" alt="Reynaerde" width="800" style="margin-left:'auto' margin-right:'auto'/>
</p>
<div style="margin:auto; text-align:center">
<h1 style="margin-bottom: 0">Reynaerde 7B</h1>
<em>A conversational model for Dutch, based on Mistral v0.3</em>
</div>
This model is a fine-tuned version of [ReBatch/Reynaerde-7B-Instruct](https://huggingface.co/ReBatch/Reynaerde-7B-Instruct) on a translation of the [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset combined with HQ samples from [BramVanroy's translation](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned).
These are combined in [ReBatch/ultrafeedback_nl](https://huggingface.co/datasets/ReBatch/ultrafeedback_nl).
## Model description
This model is a Dutch chat model, originally developed from Mistral 7B v0.3 and further finetuned first with SFT on a chat dataset and then with a DPO on a feedback Chat dataset.
## Intended uses & limitations
This model could still generate wrong, misleading, and potentially even offensive content. Use at your own risk.
Use with Mistral's chat template (can be found in the tokenizer)
## Training procedure
This model was trained with LoRa in bfloat16 with flash attention 2 on 8xA100 SXM with DeepSpeed ZeRO-3; with the DPO script from the [alignment handbook](https://github.com/huggingface/alignment-handbook/) on RunPod.
## Evaluation results
The model was evaluated using [scandeval](https://scandeval.com/dutch-nlg/). The only improvements are on the squad_nl benchmark which is a QA benchmark. This is not surprising since we only trained on chat data and not on other tasks.
| Model| conll_nl | dutch_social | scala_nl | squad_nl | wiki_lingua_nl | mmlu_nl | hellaswag_nl |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:
Reynaerde-7B-Chat-v2 | TODO / TODO | 10.83 / 27.67 | TODO / TODO | 53.56 / 65.29 | TODO / TODO | TODO / TODO | 31.36 / 47.79
Mistral-7B-v0.3 | 57.08 / 42.65 | 14.05 / 39.13 | 8.08 / 43.07 | 45.57 / 55.20 | 62.28 / 16.46 | 20.39 / 40.03 | 13.28 / 34.13
Mistral-7B-v0.3-Instruct | 60.76 / 45.39 | 13.20 / 34.26 | 23.23 / 59.26 | 48.94 / 60.13 | 66.09 / 18.02 | 24.95 / 43.67 | 24.86 / 43.57
## Model Developer
Finetuned by [Julien Van den Avenne](https://huggingface.co/vandeju) |