File size: 3,558 Bytes
d674c5d bf34c4d d674c5d bf34c4d d674c5d bf34c4d 76f4a9d edae52e bf34c4d d674c5d 1b76d93 5bced0d d674c5d bf34c4d d674c5d b0796d0 bf34c4d d674c5d bf34c4d 6815673 d674c5d bf34c4d d674c5d b0796d0 d674c5d bf34c4d d674c5d b0796d0 d674c5d bf34c4d 841f061 bf34c4d d674c5d 8361f7f d20b2ff abb3acf f325fbf 1aa2fb2 abb3acf f325fbf 0a18209 abb3acf 0a18209 b0796d0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
---
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- dpo
---
<p align="center" style="margin:0;padding:0">
<img src="8.PNG" alt="Reynaerde" width="800" style="margin-left:'auto' margin-right:'auto'/>
</p>
<div style="margin:auto; text-align:center">
<h1 style="margin-bottom: 0">Reynaerde 7B Chat</h1>
<em>A conversational model for Dutch, based on Mistral v0.3 Instruct</em>
</div>
This model is a fine-tuned version of [ReBatch/Reynaerde-7B-Instruct](https://huggingface.co/ReBatch/Reynaerde-7B-Instruct) on [ReBatch/ultrafeedback_nl](https://huggingface.co/datasets/ReBatch/ultrafeedback_nl). This is a combination of a translation of the [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset and the HQ samples from [BramVanroy's translation](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned).
## Model description
This model is a Dutch chat model, originally developed from Mistral 7B v0.3 Instruct and further fine-tuned with QLoRA. It was first fine-tuned with SFT on a chat dataset and then with DPO on a feedback chat dataset.
## Intended uses & limitations
This model could still generate wrong, misleading, and potentially even offensive content. Use at your own risk.
Use with Mistral's chat template (can be found in the tokenizer).
## Training procedure
This model was trained with QLoRa in bfloat16 with Flash Attention 2 on one A100 PCIe, using the DPO script from the [alignment handbook](https://github.com/huggingface/alignment-handbook/) on [RunPod](https://www.runpod.io/).
## Evaluation results
The model was evaluated using [scandeval](https://scandeval.com/dutch-nlg/). There are improvements in 4 out of 7 benchmarks compared to the Mistral-7B-v0.3-Instruct model on which it is based.
| Model| conll_nl | dutch_social | scala_nl | squad_nl | wiki_lingua_nl | mmlu_nl | hellaswag_nl |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:
Reynaerde-7B-Chat | 56.40 / 38.13 | 10.83 / 27.67 | 20.02 / 55.40 | 53.56 / 65.29 | 68.13 / 20.85 | 32.50 / 49.10 | 31.36 / 47.79
Mistral-7B-v0.3 | 57.08 / 42.65 | 14.05 / 39.13 | 8.08 / 43.07 | 45.57 / 55.20 | 62.28 / 16.46 | 20.39 / 40.03 | 13.28 / 34.13
Mistral-7B-v0.3-Instruct | 60.76 / 45.39 | 13.20 / 34.26 | 23.23 / 59.26 | 48.94 / 60.13 | 66.09 / 18.02 | 24.95 / 43.67 | 24.86 / 43.57
## Naming
This model is named after the Middle Dutch epic poem 'Van den vos Reynaerde'. Dating from around 1260, this epic by Flemish author Willem die Madocke maecte is often called 'the pinnacle of Gothic literature in the Netherlands'. The poem tells a version of the Reynard the Fox story, popular in Western Europe during the late Middle Ages
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 3
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 6
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Framework versions
- PEFT 0.11.1
- Transformers 4.41.2
- Pytorch 2.2.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
### Model Developer
The Mistral-7B-v0.3-Instruct model, on which this model is based, was created by [Mistral AI](https://huggingface.co/mistralai).
The finetuning was done by [Julien Van den Avenne](https://huggingface.co/vandeju).
|