Qwen1.5-0.5B-Chat with EPFL DPO fine-tuning

Qwen1.5-0.5B-Chat DPO fine-tuned on open-ended and multiple choice questions from different EPFL courses and the Orca Math dataset that consists of ~200K grade school math word problems.

Model Details

Model Description

The model was developed during the course Modern Natural Language Processing (CS-552). Its aim is to fine-tune the base model (Qwen/Qwen1.5-0.5B-Chat) to accurately answer open-ended and multiple-choice questions from various EPFL courses and Orca Math dataset.

  • Developed by: Emma Lise Boehly, Ahmed Aziz Ben Haj Hmida and Jan Kokla
  • Finetuned from model: Qwen/Qwen1.5-0.5B-Chat

Training Details

Training Data

HuggingFace dataset : microsoft/orca-math-word-problems-200k The EPFL dataset is not publicly available.

Training Procedure

Training Hyperparameters

  • Training regime: cDPO with bf16 mixed precision, $\beta=0.2$, $lr=3 \times 10^{-6}$, and $label_smoothing=0.2$

  • PEFT 0.10.0

Downloads last month
0
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for attention-avengers/Qwen1.5-0.5B-Chat-ORCA-EPFL-cDPO

Adapter
(48)
this model

Collection including attention-avengers/Qwen1.5-0.5B-Chat-ORCA-EPFL-cDPO