Reynaerde-7B-Chat / README.md
vandeju's picture
Update README.md
bf34c4d verified
|
raw
history blame
2.53 kB
metadata
license: apache-2.0
library_name: peft
tags:
  - alignment-handbook
  - dpo

Reynaerde

Reynaerde 7B

A conversational model for Dutch, based on Mistral v0.3

This model is a fine-tuned version of ReBatch/Reynaerde-7B-Instruct on a translation of the HuggingFaceH4/ultrafeedback_binarized dataset combined with HQ samples from BramVanroy's translation. These are combined in ReBatch/ultrafeedback_nl.

Model description

This model is a Dutch chat model, originally developed from Mistral 7B v0.3 and further finetuned first with SFT on a chat dataset and then with a DPO on a feedback Chat dataset.

Intended uses & limitations

This model could still generate wrong, misleading, and potentially even offensive content. Use at your own risk. Use with Mistral's chat template (can be found in the tokenizer)

Training procedure

This model was trained with LoRa in bfloat16 with flash attention 2 on 8xA100 SXM with DeepSpeed ZeRO-3; with the DPO script from the alignment handbook on RunPod.

Evaluation results

The model was evaluated using scandeval. The only improvements are on the squad_nl benchmark which is a QA benchmark. This is not surprising since we only trained on chat data and not on other tasks.

Model conll_nl dutch_social scala_nl squad_nl wiki_lingua_nl mmlu_nl hellaswag_nl
Reynaerde-7B-Chat-v2 TODO / TODO 10.83 / 27.67 TODO / TODO 53.56 / 65.29 TODO / TODO TODO / TODO 31.36 / 47.79
Mistral-7B-v0.3 57.08 / 42.65 14.05 / 39.13 8.08 / 43.07 45.57 / 55.20 62.28 / 16.46 20.39 / 40.03 13.28 / 34.13
Mistral-7B-v0.3-Instruct 60.76 / 45.39 13.20 / 34.26 23.23 / 59.26 48.94 / 60.13 66.09 / 18.02 24.95 / 43.67 24.86 / 43.57

Model Developer

Finetuned by Julien Van den Avenne