trollek
/

danube2-1.8b-Neural

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

trollek commited on Jul 5

Commit

65a6112

•

1 Parent(s): aa55a51

Update README.md

Files changed (1) hide show

README.md +114 -0

README.md CHANGED Viewed

@@ -7,4 +7,118 @@ language:
 library_name: transformers
 base_model: h2oai/h2o-danube2-1.8b-base
 ---

 library_name: transformers
 base_model: h2oai/h2o-danube2-1.8b-base
 ---
+# danube2-1.8b-ORPO
+ChatML tokens are added and first fine-tuned with BAdam and then QLoRA+ on mlabonne/orpo-dpo-mix-40k, but as SFT and not DPO, and using LLama-Factory.
+## Template
+```jinja
+<|im_start>user
+{{instruction}}<|im_end|>
+<|im_start>assistant
+{{response}}<|im_end>
+```
+## BAdam
+```yaml
+### model
+model_name_or_path: danube2-base-chatml
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+use_badam: true
+badam_switch_mode: ascending
+badam_switch_interval: 50
+badam_verbose: 1
+badam_start_block: 12
+badam_mask_mode: scatter
+seed: 314
+### dataset
+dataset: orpo_sft_mix_40k
+template: ninja_chatml
+cutoff_len: 8192
+overwrite_cache: false
+preprocessing_num_workers: 12
+### output
+output_dir: orpo-chatml-badam
+logging_steps: 5
+save_steps: 1
+save_strategy: epoch
+plot_loss: true
+overwrite_output_dir: false
+### train
+per_device_train_batch_size: 2
+gradient_accumulation_steps: 8
+learning_rate: 0.00001
+num_train_epochs: 2
+lr_scheduler_type: cosine
+warmup_ratio: 0.01
+pure_bf16: true
+flash_attn: fa2
+### eval
+val_size: 0.01
+per_device_eval_batch_size: 1
+eval_strategy: steps
+eval_steps: 1000
+```
+### QLoRA+
+```yaml
+### model
+model_name_or_path: orpo-chatml-badam
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_target: all
+loraplus_lr_ratio: 16.0
+lora_rank: 8
+lora_alpha: 16
+use_unsloth: true
+quantization_bit: 4
+upcast_layernorm: true
+seed: 31415
+### dataset
+dataset: orpo_sft_mix_40k
+template: hermes_chatml
+cutoff_len: 8192
+overwrite_cache: false
+preprocessing_num_workers: 12
+### output
+output_dir: orpo-chatml-badam/loraplus
+logging_steps: 1
+save_steps: 1
+save_strategy: epoch
+plot_loss: true
+overwrite_output_dir: false
+### train
+per_device_train_batch_size: 4
+gradient_accumulation_steps: 4
+learning_rate: 0.0001
+num_train_epochs: 2.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.01
+bf16: true
+flash_attn: fa2
+### eval
+val_size: 0.02
+per_device_eval_batch_size: 1
+eval_strategy: steps
+eval_steps: 1000
+```