hermeschen1116's picture
dpo (#2)
003b08b verified
|
raw
history blame
5.42 kB
metadata
base_model:
  - unsloth/llama-2-7b-bnb-4bit
  - hermeschen1116/response_generator_for_emotion_chat_bot
library_name: peft
license: apache-2.0
datasets:
  - Shotaro30678/rlhf-RG-trl-style-v3
tags:
  - trl
  - unsloth
language:
  - en
pipeline_tag: text-generation

Response Generator for Emotion Chat Bot

Model description

This model is a dpo fine-tuned version of hermeschen1116/response_generator_for_emotion_chat_bot on Shotaro30678/rlhf-RG-trl-style-v3, self modified version of daily_dialog.

Intended uses & limitations

Use dpo trainer to do the RLHF so that the model can be more precise and consistent.

Model performance

Sentiment Score: Shotaro30678/emotion_text_classifier_on_dd_v1

Metric DPO Trained Model SFT Model (Reference)
Accuracy 0.851 0.788
F1-score 0.8564 0.7975

Gibberish Distribution: madhurjindal/autonlp-Gibberish-Detector-492513457

Category DPO Trained Model SFT Model (Reference)
Clean 882 898
Mild Gibberish 94 58
Word Salad 21 33
Noise 3 11

Cut-Off Output:

Output Type DPO Trained Model SFT Model (Reference)
Complete Output 985 975
Incomplete Output 15 25

on hermeschen1116/daily_dialog_for_RG test split.

test on config:

  generation_config = GenerationConfig(
      max_new_tokens=150,
      min_new_tokens=5,
      repetition_penalty=1.1,
      top_k=3,
      top_p=0.9,
      pad_token_id=tokenizer.pad_token_id,
      eos_token_id=tokenizer.eos_token_id,
      temperature=1.0,
      do_sample=True,
      num_beams=1
  )

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • beta=0.1,
  • remove_unused_columns=False,
  • num_train_epochs=3,
  • gradient_checkpointing=True

others remain default

Framework versions

  • Bitsandbytes 0.43.1
  • Datasets 2.20.0
  • PEFT 0.11.1
  • Pytorch 2.3.0+cu121
  • Transformers 4.42.4
  • Tokenizers 0.19.1
  • Trl 0.8.6
  • unsloth 2024.7 0f2e484

Uploaded model

  • Developed by: Shotaro30678
  • Finetuned from model : hermeschen1116/response_generator_for_emotion_chat_bot

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Quick sample

  # libs are from github repo
  from libs import ResponseGeneratorPipeline
  from unsloth import FastLanguageModel
  model, tokenizer = FastLanguageModel.from_pretrained(
      model_name = "Shotaro30678/response_generator_DPO", # YOUR MODEL YOU USED FOR TRAINING
      load_in_4bit = True,
  )
  FastLanguageModel.for_inference(model) # Enable native 2x faster inference
  
  bot = ResponseGeneratorPipeline(
      model,
      tokenizer,
      framework="pt",
      task="conversation-generation",
      num_workers=16,
      torch_dtype="auto",
      add_special_tokens=True,
      truncation=False,
      padding=True
  )
  
  conversation = [
      {'content': {'dialog': '', 'emotion': ''}, 'role': 'system'},
      {'content': {'dialog': 'Can you do push-ups ?', 'emotion': 'neutral'},
      'role': 'user'},
      {'content': {'dialog': "Of course I can . It's a piece of cake ! Believe it or not , I can do 30 push-ups a minute .",
      'emotion': 'neutral'},
      'role': 'assistant'},
      {'content': {'dialog': "Really ? I think that's impossible !",
      'emotion': 'surprise'},
      'role': 'user'},
      {'content': {'dialog': 'You mean 30 push-ups ?', 'emotion': 'neutral'},
      'role': 'assistant'},
      {'content': {'dialog': 'Yeah !', 'emotion': 'neutral'}, 'role': 'user'},
      {'content': {'dialog': '', 'emotion': 'neutral'}, 'role': 'assistant'}
   ]
  
  generation_config = GenerationConfig(
      max_new_tokens=150,
      min_new_tokens=5,
      repetition_penalty=1.1,
      top_k=3,
      top_p=0.9,
      pad_token_id=tokenizer.pad_token_id,
      eos_token_id=tokenizer.eos_token_id,
      temperature=1.0,
      do_sample=True,
      num_beams=1
  )
  
  print(bot(conversation, generation_config=generation_config)[0]['generated_text'][-1]["content"]["dialog"])

output:

30 push-ups in a row?