trollek commited on
Commit
65a6112
1 Parent(s): aa55a51

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md CHANGED
@@ -7,4 +7,118 @@ language:
7
  library_name: transformers
8
  base_model: h2oai/h2o-danube2-1.8b-base
9
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
 
7
  library_name: transformers
8
  base_model: h2oai/h2o-danube2-1.8b-base
9
  ---
10
+ # danube2-1.8b-ORPO
11
+
12
+ ChatML tokens are added and first fine-tuned with BAdam and then QLoRA+ on mlabonne/orpo-dpo-mix-40k, but as SFT and not DPO, and using LLama-Factory.
13
+
14
+ ## Template
15
+
16
+ ```jinja
17
+ <|im_start>user
18
+ {{instruction}}<|im_end|>
19
+ <|im_start>assistant
20
+ {{response}}<|im_end>
21
+ ```
22
+
23
+ ## BAdam
24
+
25
+ ```yaml
26
+ ### model
27
+ model_name_or_path: danube2-base-chatml
28
+
29
+ ### method
30
+ stage: sft
31
+ do_train: true
32
+ finetuning_type: full
33
+ use_badam: true
34
+ badam_switch_mode: ascending
35
+ badam_switch_interval: 50
36
+ badam_verbose: 1
37
+ badam_start_block: 12
38
+ badam_mask_mode: scatter
39
+ seed: 314
40
+
41
+ ### dataset
42
+ dataset: orpo_sft_mix_40k
43
+ template: ninja_chatml
44
+ cutoff_len: 8192
45
+ overwrite_cache: false
46
+ preprocessing_num_workers: 12
47
+
48
+ ### output
49
+ output_dir: orpo-chatml-badam
50
+ logging_steps: 5
51
+ save_steps: 1
52
+ save_strategy: epoch
53
+ plot_loss: true
54
+ overwrite_output_dir: false
55
+
56
+ ### train
57
+ per_device_train_batch_size: 2
58
+ gradient_accumulation_steps: 8
59
+ learning_rate: 0.00001
60
+ num_train_epochs: 2
61
+ lr_scheduler_type: cosine
62
+ warmup_ratio: 0.01
63
+ pure_bf16: true
64
+ flash_attn: fa2
65
+
66
+ ### eval
67
+ val_size: 0.01
68
+ per_device_eval_batch_size: 1
69
+ eval_strategy: steps
70
+ eval_steps: 1000
71
+ ```
72
+
73
+
74
+ ### QLoRA+
75
+
76
+ ```yaml
77
+ ### model
78
+ model_name_or_path: orpo-chatml-badam
79
+
80
+ ### method
81
+ stage: sft
82
+ do_train: true
83
+ finetuning_type: lora
84
+ lora_target: all
85
+ loraplus_lr_ratio: 16.0
86
+ lora_rank: 8
87
+ lora_alpha: 16
88
+ use_unsloth: true
89
+ quantization_bit: 4
90
+ upcast_layernorm: true
91
+ seed: 31415
92
+
93
+ ### dataset
94
+ dataset: orpo_sft_mix_40k
95
+ template: hermes_chatml
96
+ cutoff_len: 8192
97
+ overwrite_cache: false
98
+ preprocessing_num_workers: 12
99
+
100
+ ### output
101
+ output_dir: orpo-chatml-badam/loraplus
102
+ logging_steps: 1
103
+ save_steps: 1
104
+ save_strategy: epoch
105
+ plot_loss: true
106
+ overwrite_output_dir: false
107
+
108
+ ### train
109
+ per_device_train_batch_size: 4
110
+ gradient_accumulation_steps: 4
111
+ learning_rate: 0.0001
112
+ num_train_epochs: 2.0
113
+ lr_scheduler_type: cosine
114
+ warmup_ratio: 0.01
115
+ bf16: true
116
+ flash_attn: fa2
117
+
118
+ ### eval
119
+ val_size: 0.02
120
+ per_device_eval_batch_size: 1
121
+ eval_strategy: steps
122
+ eval_steps: 1000
123
+ ```
124