Reynaerde-7B-Chat / README.md
vandeju's picture
Update README.md
b0796d0 verified
|
raw
history blame
3.13 kB
metadata
license: apache-2.0
library_name: peft
tags:
  - alignment-handbook
  - dpo

Reynaerde

Reynaerde 7B Chat

A conversational model for Dutch, based on Mistral v0.3 Instruct

This model is a fine-tuned version of TODO on ReBatch/ultrafeedback_nl. This is a combination of a translation of the HuggingFaceH4/ultrafeedback_binarized dataset and the HQ samples from BramVanroy's translation.

Model description

This model is a Dutch chat model, originally developed from Mistral 7B v0.3 Instruct and further fine-tuned with QLoRA. It was first fine-tuned with SFT on a chat dataset and then with DPO on a feedback chat dataset.

Intended uses & limitations

This model could still generate wrong, misleading, and potentially even offensive content. Use at your own risk. Use with Mistral's chat template (can be found in the tokenizer).

Training procedure

This model was trained with QLoRa in bfloat16 with Flash Attention 2 on one A100 PCIe, using the DPO script from the alignment handbook on RunPod.

Evaluation results

The model was evaluated using scandeval. There are improvements in 4 out of 7 benchmarks compared to the Mistral-7B-v0.3-Instruct model on which it is based.

Model conll_nl dutch_social scala_nl squad_nl wiki_lingua_nl mmlu_nl hellaswag_nl
Reynaerde-7B-Chat 56.40 / 38.13 10.83 / 27.67 20.02 / 55.40 53.56 / 65.29 68.13 / 20.85 32.50 / 49.10 31.36 / 47.79
Mistral-7B-v0.3 57.08 / 42.65 14.05 / 39.13 8.08 / 43.07 45.57 / 55.20 62.28 / 16.46 20.39 / 40.03 13.28 / 34.13
Mistral-7B-v0.3-Instruct 60.76 / 45.39 13.20 / 34.26 23.23 / 59.26 48.94 / 60.13 66.09 / 18.02 24.95 / 43.67 24.86 / 43.57

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 3
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 6
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1

Model Developer

The Mistral-7B-v0.3-Instruct model, on which this model is based, was created by Mistral AI. The finetuning was done by Julien Van den Avenne.