---
base_model: 
- unsloth/llama-2-7b-bnb-4bit
- hermeschen1116/response_generator_for_emotion_chat_bot
library_name: peft
license: apache-2.0
datasets:
- Shotaro30678/rlhf-RG-trl-style-v3
tags:
- trl
- unsloth
language:
- en
pipeline_tag: text-generation

---
# Response Generator for [Emotion Chat Bot](https://github.com/hermeschen1116/chat-bot)


## Model description

This model is a dpo fine-tuned version of [hermeschen1116/response_generator_for_emotion_chat_bot](https://huggingface.co/hermeschen1116/response_generator_for_emotion_chat_bot) on [Shotaro30678/rlhf-RG-trl-style-v3](https://huggingface.co/datasets/Shotaro30678/rlhf-RG-trl-style-v3), self modified version of [daily_dialog](li2017dailydialog/daily_dialog).

## Intended uses & limitations

Use dpo trainer to do the RLHF so that the model can be more precise and consistent.

## Model performance

**Sentiment Score:**
**[Shotaro30678/emotion_text_classifier_on_dd_v1](https://huggingface.co/Shotaro30678/emotion_text_classifier_on_dd_v1)**

| **Metric**   | **DPO Trained Model** | **SFT Model (Reference)** |
|--------------|:----------------------:|:--------------------------:|
| **Accuracy** | 0.851                 | 0.788                     |
| **F1-score** | 0.8564                | 0.7975                    |

**Gibberish Distribution:**
**[madhurjindal/autonlp-Gibberish-Detector-492513457](https://huggingface.co/madhurjindal/autonlp-Gibberish-Detector-492513457)**

| **Category**        | **DPO Trained Model** | **SFT Model (Reference)** |
|---------------------|:----------------------:|:--------------------------:|
| **Clean**           | 882                   | 898                       |
| **Mild Gibberish**  | 94                    | 58                        |
| **Word Salad**      | 21                    | 33                        |
| **Noise**           | 3                     | 11                        |

**Cut-Off Output:**

| **Output Type**     | **DPO Trained Model** | **SFT Model (Reference)** |
|---------------------|:----------------------:|:--------------------------:|
| **Complete Output** | 985                   | 975                       |
| **Incomplete Output** | 15                  | 25                        |

on [hermeschen1116/daily_dialog_for_RG](https://huggingface.co/datasets/hermeschen1116/daily_dialog_for_RG) test split.

**test on config:**
```python
  generation_config = GenerationConfig(
      max_new_tokens=150,
      min_new_tokens=5,
      repetition_penalty=1.1,
      top_k=3,
      top_p=0.9,
      pad_token_id=tokenizer.pad_token_id,
      eos_token_id=tokenizer.eos_token_id,
      temperature=1.0,
      do_sample=True,
      num_beams=1
  )
```
## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- beta=0.1,
- remove_unused_columns=False,
- num_train_epochs=3,
- gradient_checkpointing=True

others remain default

### Framework versions

- Bitsandbytes 0.43.1
- Datasets 2.20.0
- PEFT 0.11.1
- Pytorch 2.3.0+cu121
- Transformers 4.42.4
- Tokenizers 0.19.1
- Trl 0.8.6
- unsloth 2024.7 0f2e484
  
# Uploaded  model

- **Developed by:** Shotaro30678
- **Finetuned from model :** hermeschen1116/response_generator_for_emotion_chat_bot

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

# Quick sample
```python
  # libs are from github repo
  from libs import ResponseGeneratorPipeline
  from unsloth import FastLanguageModel
  model, tokenizer = FastLanguageModel.from_pretrained(
      model_name = "Shotaro30678/response_generator_DPO", # YOUR MODEL YOU USED FOR TRAINING
      load_in_4bit = True,
  )
  FastLanguageModel.for_inference(model) # Enable native 2x faster inference
  
  bot = ResponseGeneratorPipeline(
      model,
      tokenizer,
      framework="pt",
      task="conversation-generation",
      num_workers=16,
      torch_dtype="auto",
      add_special_tokens=True,
      truncation=False,
      padding=True
  )
  
  conversation = [
      {'content': {'dialog': '', 'emotion': ''}, 'role': 'system'},
      {'content': {'dialog': 'Can you do push-ups ?', 'emotion': 'neutral'},
      'role': 'user'},
      {'content': {'dialog': "Of course I can . It's a piece of cake ! Believe it or not , I can do 30 push-ups a minute .",
      'emotion': 'neutral'},
      'role': 'assistant'},
      {'content': {'dialog': "Really ? I think that's impossible !",
      'emotion': 'surprise'},
      'role': 'user'},
      {'content': {'dialog': 'You mean 30 push-ups ?', 'emotion': 'neutral'},
      'role': 'assistant'},
      {'content': {'dialog': 'Yeah !', 'emotion': 'neutral'}, 'role': 'user'},
      {'content': {'dialog': '', 'emotion': 'neutral'}, 'role': 'assistant'}
   ]
  
  generation_config = GenerationConfig(
      max_new_tokens=150,
      min_new_tokens=5,
      repetition_penalty=1.1,
      top_k=3,
      top_p=0.9,
      pad_token_id=tokenizer.pad_token_id,
      eos_token_id=tokenizer.eos_token_id,
      temperature=1.0,
      do_sample=True,
      num_beams=1
  )
  
  print(bot(conversation, generation_config=generation_config)[0]['generated_text'][-1]["content"]["dialog"])
```
**output:**
```
30 push-ups in a row? 
```