File size: 2,845 Bytes
d674c5d bf34c4d d674c5d bf34c4d d674c5d bf34c4d 76f4a9d edae52e bf34c4d d674c5d 1124e19 5bced0d d674c5d bf34c4d d674c5d 1aa2fb2 d674c5d bf34c4d d674c5d bf34c4d d674c5d bf34c4d d674c5d f325fbf d674c5d bf34c4d d674c5d 50980f2 d674c5d bf34c4d 76f4a9d bf34c4d d674c5d f325fbf 1aa2fb2 f325fbf 1aa2fb2 f325fbf 0a18209 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- dpo
---
<p align="center" style="margin:0;padding:0">
<img src="8.PNG" alt="Reynaerde" width="800" style="margin-left:'auto' margin-right:'auto'/>
</p>
<div style="margin:auto; text-align:center">
<h1 style="margin-bottom: 0">Reynaerde 7B Chat</h1>
<em>A conversational model for Dutch, based on Mistral v0.3 Instruct</em>
</div>
This model is a fine-tuned version of TODO on [ReBatch/ultrafeedback_nl](https://huggingface.co/datasets/ReBatch/ultrafeedback_nl). This is a combination of a translation of the [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset and the HQ samples from [BramVanroy's translation](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned).
## Model description
This model is a Dutch chat model, originally developed from Mistral 7B v0.3 Instruct and further finetuned with QLoRA. First with SFT on a chat dataset and then with a DPO on a feedback Chat dataset.
## Intended uses & limitations
This model could still generate wrong, misleading, and potentially even offensive content. Use at your own risk.
Use with Mistral's chat template (can be found in the tokenizer)
## Training procedure
This model was trained with QLoRa in bfloat16 with flash attention 2 on oen A100 PCIe; with the DPO script from the [alignment handbook](https://github.com/huggingface/alignment-handbook/) on [RunPod](https://www.runpod.io/).
## Evaluation results
The model was evaluated using [scandeval](https://scandeval.com/dutch-nlg/).
| Model| conll_nl | dutch_social | scala_nl | squad_nl | wiki_lingua_nl | mmlu_nl | hellaswag_nl |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:
Reynaerde-7B-Chat | 56.40 / 38.13 | 10.83 / 27.67 | 20.02 / 55.40 | 53.56 / 65.29 | TODO / TODO | TODO / TODO | 31.36 / 47.79
Mistral-7B-v0.3 | 57.08 / 42.65 | 14.05 / 39.13 | 8.08 / 43.07 | 45.57 / 55.20 | 62.28 / 16.46 | 20.39 / 40.03 | 13.28 / 34.13
Mistral-7B-v0.3-Instruct | 60.76 / 45.39 | 13.20 / 34.26 | 23.23 / 59.26 | 48.94 / 60.13 | 66.09 / 18.02 | 24.95 / 43.67 | 24.86 / 43.57
## Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 3
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 6
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
## Framework versions
- PEFT 0.11.1
- Transformers 4.41.2
- Pytorch 2.2.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
## Model Developer
Finetuned by [Julien Van den Avenne](https://huggingface.co/vandeju)
|