dpo (#2)

- update: PR dpo (bff58f2726ada4c4f82cd75cf419e786b005acdc)
- update: rename readme to avoid conflict (bfeb2ea98aa100474e1bc6a86d725a477047ad43)
- update: readme (c021e9b2dc72c8022a86c2a0003d30f85436c4aa)

Files changed (8) hide show

README_dpo.md +161 -0
adapter_config.json +0 -37
config.json +43 -0
generation_config.json +10 -0
adapter_model.safetensors → model.safetensors +2 -2
special_tokens_map.json +4 -28
tokenizer_config.json +2 -1
training_args.bin +0 -3

README_dpo.md ADDED Viewed

	@@ -0,0 +1,161 @@

+---
+base_model:
+- unsloth/llama-2-7b-bnb-4bit
+- hermeschen1116/response_generator_for_emotion_chat_bot
+library_name: peft
+license: apache-2.0
+datasets:
+- Shotaro30678/rlhf-RG-trl-style-v3
+tags:
+- trl
+- unsloth
+language:
+- en
+pipeline_tag: text-generation
+---
+# Response Generator for [Emotion Chat Bot](https://github.com/hermeschen1116/chat-bot)
+## Model description
+This model is a dpo fine-tuned version of [hermeschen1116/response_generator_for_emotion_chat_bot](https://huggingface.co/hermeschen1116/response_generator_for_emotion_chat_bot) on [Shotaro30678/rlhf-RG-trl-style-v3](https://huggingface.co/datasets/Shotaro30678/rlhf-RG-trl-style-v3), self modified version of [daily_dialog](li2017dailydialog/daily_dialog).
+## Intended uses & limitations
+Use dpo trainer to do the RLHF so that the model can be more precise and consistent.
+## Model performance
+**Sentiment Score:**
+**[Shotaro30678/emotion_text_classifier_on_dd_v1](https://huggingface.co/Shotaro30678/emotion_text_classifier_on_dd_v1)**
+| **Metric**   | **DPO Trained Model** | **SFT Model (Reference)** |
+|--------------|:----------------------:|:--------------------------:|
+| **Accuracy** | 0.851                 | 0.788                     |
+| **F1-score** | 0.8564                | 0.7975                    |
+**Gibberish Distribution:**
+**[madhurjindal/autonlp-Gibberish-Detector-492513457](https://huggingface.co/madhurjindal/autonlp-Gibberish-Detector-492513457)**
+| **Category**        | **DPO Trained Model** | **SFT Model (Reference)** |
+|---------------------|:----------------------:|:--------------------------:|
+| **Clean**           | 882                   | 898                       |
+| **Mild Gibberish**  | 94                    | 58                        |
+| **Word Salad**      | 21                    | 33                        |
+| **Noise**           | 3                     | 11                        |
+**Cut-Off Output:**
+| **Output Type**     | **DPO Trained Model** | **SFT Model (Reference)** |
+|---------------------|:----------------------:|:--------------------------:|
+| **Complete Output** | 985                   | 975                       |
+| **Incomplete Output** | 15                  | 25                        |
+on [hermeschen1116/daily_dialog_for_RG](https://huggingface.co/datasets/hermeschen1116/daily_dialog_for_RG) test split.
+**test on config:**
+```python
+  generation_config = GenerationConfig(
+      max_new_tokens=150,
+      min_new_tokens=5,
+      repetition_penalty=1.1,
+      top_k=3,
+      top_p=0.9,
+      pad_token_id=tokenizer.pad_token_id,
+      eos_token_id=tokenizer.eos_token_id,
+      temperature=1.0,
+      do_sample=True,
+      num_beams=1
+  )
+```
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- beta=0.1,
+- remove_unused_columns=False,
+- num_train_epochs=3,
+- gradient_checkpointing=True
+others remain default
+### Framework versions
+- Bitsandbytes 0.43.1
+- Datasets 2.20.0
+- PEFT 0.11.1
+- Pytorch 2.3.0+cu121
+- Transformers 4.42.4
+- Tokenizers 0.19.1
+- Trl 0.8.6
+- unsloth 2024.7 0f2e484
+# Uploaded  model
+- **Developed by:** Shotaro30678
+- **Finetuned from model :** hermeschen1116/response_generator_for_emotion_chat_bot
+This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
+[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
+# Quick sample
+```python
+  # libs are from github repo
+  from libs import ResponseGeneratorPipeline
+  from unsloth import FastLanguageModel
+  model, tokenizer = FastLanguageModel.from_pretrained(
+      model_name = "Shotaro30678/response_generator_DPO", # YOUR MODEL YOU USED FOR TRAINING
+      load_in_4bit = True,
+  )
+  FastLanguageModel.for_inference(model) # Enable native 2x faster inference
+  bot = ResponseGeneratorPipeline(
+      model,
+      tokenizer,
+      framework="pt",
+      task="conversation-generation",
+      num_workers=16,
+      torch_dtype="auto",
+      add_special_tokens=True,
+      truncation=False,
+      padding=True
+  )
+  conversation = [
+      {'content': {'dialog': '', 'emotion': ''}, 'role': 'system'},
+      {'content': {'dialog': 'Can you do push-ups ?', 'emotion': 'neutral'},
+      'role': 'user'},
+      {'content': {'dialog': "Of course I can . It's a piece of cake ! Believe it or not , I can do 30 push-ups a minute .",
+      'emotion': 'neutral'},
+      'role': 'assistant'},
+      {'content': {'dialog': "Really ? I think that's impossible !",
+      'emotion': 'surprise'},
+      'role': 'user'},
+      {'content': {'dialog': 'You mean 30 push-ups ?', 'emotion': 'neutral'},
+      'role': 'assistant'},
+      {'content': {'dialog': 'Yeah !', 'emotion': 'neutral'}, 'role': 'user'},
+      {'content': {'dialog': '', 'emotion': 'neutral'}, 'role': 'assistant'}
+   ]
+  generation_config = GenerationConfig(
+      max_new_tokens=150,
+      min_new_tokens=5,
+      repetition_penalty=1.1,
+      top_k=3,
+      top_p=0.9,
+      pad_token_id=tokenizer.pad_token_id,
+      eos_token_id=tokenizer.eos_token_id,
+      temperature=1.0,
+      do_sample=True,
+      num_beams=1
+  )
+  print(bot(conversation, generation_config=generation_config)[0]['generated_text'][-1]["content"]["dialog"])
+```
+**output:**
+```
+30 push-ups in a row?
+```

adapter_config.json DELETED Viewed

@@ -1,37 +0,0 @@
-{
-  "alpha_pattern": {},
-  "auto_mapping": null,
-  "base_model_name_or_path": "unsloth/llama-2-7b-bnb-4bit",
-  "bias": "none",
-  "fan_in_fan_out": false,
-  "inference_mode": true,
-  "init_lora_weights": true,
-  "layer_replication": null,
-  "layers_pattern": null,
-  "layers_to_transform": null,
-  "loftq_config": {},
-  "lora_alpha": 16,
-  "lora_dropout": 0.1,
-  "megatron_config": null,
-  "megatron_core": "megatron.core",
-  "modules_to_save": [
-    "lm_head",
-    "embed_tokens"
-  ],
-  "peft_type": "LORA",
-  "r": 8,
-  "rank_pattern": {},
-  "revision": "unsloth",
-  "target_modules": [
-    "up_proj",
-    "v_proj",
-    "o_proj",
-    "down_proj",
-    "k_proj",
-    "q_proj",
-    "gate_proj"
-  ],
-  "task_type": "CAUSAL_LM",
-  "use_dora": false,
-  "use_rslora": true
-}

config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "_name_or_path": "16bit_model_3epo-v3",
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 11008,
+  "max_position_embeddings": 4096,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 32,
+  "pad_token_id": 0,
+  "pretraining_tp": 1,
+  "quantization_config": {
+    "bnb_4bit_compute_dtype": "bfloat16",
+    "bnb_4bit_quant_type": "nf4",
+    "bnb_4bit_use_double_quant": true,
+    "llm_int8_enable_fp32_cpu_offload": false,
+    "llm_int8_has_fp16_weight": false,
+    "llm_int8_skip_modules": null,
+    "llm_int8_threshold": 6.0,
+    "load_in_4bit": true,
+    "load_in_8bit": false,
+    "quant_method": "bitsandbytes"
+  },
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "rope_theta": 10000.0,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.42.4",
+  "unsloth_version": "2024.7",
+  "use_cache": false,
+  "vocab_size": 32005
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "bos_token_id": 1,
+  "do_sample": true,
+  "eos_token_id": 2,
+  "max_length": 4096,
+  "pad_token_id": 0,
+  "temperature": 0.6,
+  "top_p": 0.9,
+  "transformers_version": "4.42.4"
+}

adapter_model.safetensors → model.safetensors RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e33740cf840bcfdb116fbeca8b9356c6534619a884b57c85950643e7865e95d9
-size 1653123632

 version https://git-lfs.github.com/spec/v1
+oid sha256:3dffd8b9dc73074d84ce714f560c842b88f3622ebca65c31534bf38c1304cd86
+size 3866124296

special_tokens_map.json CHANGED Viewed

@@ -1,33 +1,9 @@
 {
   "additional_special_tokens": [
-    {
-      "content": "[INST]",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false
-    },
-    {
-      "content": "[/INST]",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false
-    },
-    {
-      "content": "[EMOTION]",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false
-    },
-    {
-      "content": "[/EMOTION]",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false
-    }
   ],
   "bos_token": {
     "content": "<s>",

 {
   "additional_special_tokens": [
+    "[INST]",
+    "[/INST]",
+    "[EMOTION]",
+    "[/EMOTION]"
   ],
   "bos_token": {
     "content": "<s>",

tokenizer_config.json CHANGED Viewed

@@ -1,6 +1,7 @@
 {
   "add_bos_token": true,
   "add_eos_token": false,
   "added_tokens_decoder": {
     "0": {
       "content": "<unk>",
@@ -80,7 +81,7 @@
   "legacy": false,
   "model_max_length": 4096,
   "pad_token": "<pad>",
-  "padding_side": "right",
   "sp_model_kwargs": {},
   "tokenizer_class": "LlamaTokenizer",
   "unk_token": "<unk>",

 {
   "add_bos_token": true,
   "add_eos_token": false,
+  "add_prefix_space": null,
   "added_tokens_decoder": {
     "0": {
       "content": "<unk>",
   "legacy": false,
   "model_max_length": 4096,
   "pad_token": "<pad>",
+  "padding_side": "left",
   "sp_model_kwargs": {},
   "tokenizer_class": "LlamaTokenizer",
   "unk_token": "<unk>",

training_args.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:1c7df05d620f636d89b16a79736c9367ba18b1a8b0a255003afdef713dccd26d
-size 5176