trollek commited on
Commit
0cc5ff4
1 Parent(s): e6de7cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -57
README.md CHANGED
@@ -6,21 +6,24 @@ language:
6
  - en
7
  library_name: transformers
8
  base_model: h2oai/h2o-danube2-1.8b-base
 
 
 
9
  ---
10
- # danube2-1.8b-ORPO
11
 
12
- ChatML tokens are added and first fine-tuned with BAdam and then QLoRA+ on mlabonne/orpo-dpo-mix-40k, but as SFT and not DPO, and using LLama-Factory.
13
 
14
  ## Template
15
 
16
  ```jinja
17
- <|im_start>user
18
  {{instruction}}<|im_end|>
19
- <|im_start>assistant
20
- {{response}}<|im_end>
21
  ```
22
 
23
- ## BAdam
24
 
25
  ```yaml
26
  ### model
@@ -40,7 +43,7 @@ seed: 314
40
 
41
  ### dataset
42
  dataset: orpo_sft_mix_40k
43
- template: ninja_chatml
44
  cutoff_len: 8192
45
  overwrite_cache: false
46
  preprocessing_num_workers: 12
@@ -70,55 +73,13 @@ eval_strategy: steps
70
  eval_steps: 1000
71
  ```
72
 
 
73
 
74
- ### QLoRA+
75
-
76
- ```yaml
77
- ### model
78
- model_name_or_path: orpo-chatml-badam
79
-
80
- ### method
81
- stage: sft
82
- do_train: true
83
- finetuning_type: lora
84
- lora_target: all
85
- loraplus_lr_ratio: 16.0
86
- lora_rank: 8
87
- lora_alpha: 16
88
- use_unsloth: true
89
- quantization_bit: 4
90
- upcast_layernorm: true
91
- seed: 31415
92
-
93
- ### dataset
94
- dataset: orpo_sft_mix_40k
95
- template: hermes_chatml
96
- cutoff_len: 8192
97
- overwrite_cache: false
98
- preprocessing_num_workers: 12
99
-
100
- ### output
101
- output_dir: orpo-chatml-badam/loraplus
102
- logging_steps: 1
103
- save_steps: 1
104
- save_strategy: epoch
105
- plot_loss: true
106
- overwrite_output_dir: false
107
-
108
- ### train
109
- per_device_train_batch_size: 4
110
- gradient_accumulation_steps: 4
111
- learning_rate: 0.0001
112
- num_train_epochs: 2.0
113
- lr_scheduler_type: cosine
114
- warmup_ratio: 0.01
115
- bf16: true
116
- flash_attn: fa2
117
-
118
- ### eval
119
- val_size: 0.02
120
- per_device_eval_batch_size: 1
121
- eval_strategy: steps
122
- eval_steps: 1000
123
- ```
124
 
 
6
  - en
7
  library_name: transformers
8
  base_model: h2oai/h2o-danube2-1.8b-base
9
+ tags:
10
+ - llama-factory
11
+ - unsloth
12
  ---
13
+ # h2o-danube2 with ChatML template
14
 
15
+ This model was first fine-tuned with [BAdam](https://arxiv.org/abs/2404.02827 "BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models") on [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k), but as SFT and not DPO, using LLama-Factory.
16
 
17
  ## Template
18
 
19
  ```jinja
20
+ <|im_start|>user
21
  {{instruction}}<|im_end|>
22
+ <|im_start|>assistant
23
+ {{response}}<|im_end|>
24
  ```
25
 
26
+ ## BAdam config
27
 
28
  ```yaml
29
  ### model
 
43
 
44
  ### dataset
45
  dataset: orpo_sft_mix_40k
46
+ template: hermes_chatml
47
  cutoff_len: 8192
48
  overwrite_cache: false
49
  preprocessing_num_workers: 12
 
73
  eval_steps: 1000
74
  ```
75
 
76
+ ### BAdam training results
77
 
78
+ | Training Loss | Epoch | Step | Validation Loss |
79
+ |:-------------:|:------:|:----:|:---------------:|
80
+ | 0.7474 | 0.3653 | 1000 | 0.8887 |
81
+ | 0.9106 | 0.7306 | 2000 | 0.8681 |
82
+ | 0.8121 | 1.0958 | 3000 | 0.8635 |
83
+ | 0.8636 | 1.4611 | 4000 | 0.8562 |
84
+ | 0.8 | 1.8264 | 5000 | 0.8565 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85