File size: 2,845 Bytes

d674c5d
bf34c4d
 
 
 
 
d674c5d
 
bf34c4d
 
 
d674c5d
 
 
bf34c4d
76f4a9d
edae52e
bf34c4d
d674c5d
 
 
 
 
 
1124e19
5bced0d
d674c5d
 
bf34c4d
d674c5d
1aa2fb2
d674c5d
 
bf34c4d
d674c5d
bf34c4d
 
d674c5d
bf34c4d
d674c5d
 
f325fbf
d674c5d
bf34c4d
d674c5d
50980f2
d674c5d
bf34c4d
 
76f4a9d
bf34c4d
 
d674c5d
f325fbf
1aa2fb2
f325fbf
 
 
 
 
 
 
 
 
 
 
 
 
1aa2fb2
 
f325fbf
 
 
 
 
0a18209

---
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- dpo
---

<p align="center" style="margin:0;padding:0">
<img src="8.PNG" alt="Reynaerde" width="800" style="margin-left:'auto' margin-right:'auto'/>
</p>



<div style="margin:auto; text-align:center">
<h1 style="margin-bottom: 0">Reynaerde 7B Chat</h1>
<em>A conversational model for Dutch, based on Mistral v0.3 Instruct</em>
</div>






This model is a fine-tuned version of TODO on [ReBatch/ultrafeedback_nl](https://huggingface.co/datasets/ReBatch/ultrafeedback_nl). This is a combination of a translation of the [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset and the HQ samples from [BramVanroy's translation](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned). 



## Model description

This model is a Dutch chat model, originally developed from Mistral 7B v0.3 Instruct and further finetuned with QLoRA. First with SFT on a chat dataset and then with a DPO on a feedback Chat dataset.


## Intended uses & limitations

This model could still generate wrong, misleading, and potentially even offensive content. Use at your own risk. 
Use with Mistral's chat template (can be found in the tokenizer)

## Training procedure


This model was trained with QLoRa in bfloat16 with flash attention 2 on oen A100 PCIe; with the DPO script from the [alignment handbook](https://github.com/huggingface/alignment-handbook/) on [RunPod](https://www.runpod.io/).

## Evaluation results

The model was evaluated using [scandeval](https://scandeval.com/dutch-nlg/). 

| Model| conll_nl | dutch_social | scala_nl | squad_nl | wiki_lingua_nl | mmlu_nl | hellaswag_nl |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:
Reynaerde-7B-Chat | 56.40 / 38.13 | 10.83 / 27.67 | 20.02 / 55.40 | 53.56 / 65.29 | TODO / TODO | TODO / TODO |  31.36 / 47.79
Mistral-7B-v0.3 | 57.08 / 42.65 | 14.05 / 39.13 | 8.08 / 43.07 | 45.57 / 55.20 | 62.28 /  16.46 | 20.39 / 40.03 | 13.28 / 34.13
Mistral-7B-v0.3-Instruct | 60.76 / 45.39 | 13.20 / 34.26 | 23.23 / 59.26 | 48.94 / 60.13 | 66.09 / 18.02 | 24.95 / 43.67 | 24.86 / 43.57


## Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 3
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 6
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

## Framework versions

- PEFT 0.11.1
- Transformers 4.41.2
- Pytorch 2.2.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1

## Model Developer

Finetuned by [Julien Van den Avenne](https://huggingface.co/vandeju)