--- license: apache-2.0 datasets: - mlabonne/orpo-dpo-mix-40k language: - en library_name: transformers base_model: h2oai/h2o-danube2-1.8b-base --- # danube2-1.8b-ORPO ChatML tokens are added and first fine-tuned with BAdam and then QLoRA+ on mlabonne/orpo-dpo-mix-40k, but as SFT and not DPO, and using LLama-Factory. ## Template ```jinja <|im_start>user {{instruction}}<|im_end|> <|im_start>assistant {{response}}<|im_end> ``` ## BAdam ```yaml ### model model_name_or_path: danube2-base-chatml ### method stage: sft do_train: true finetuning_type: full use_badam: true badam_switch_mode: ascending badam_switch_interval: 50 badam_verbose: 1 badam_start_block: 12 badam_mask_mode: scatter seed: 314 ### dataset dataset: orpo_sft_mix_40k template: ninja_chatml cutoff_len: 8192 overwrite_cache: false preprocessing_num_workers: 12 ### output output_dir: orpo-chatml-badam logging_steps: 5 save_steps: 1 save_strategy: epoch plot_loss: true overwrite_output_dir: false ### train per_device_train_batch_size: 2 gradient_accumulation_steps: 8 learning_rate: 0.00001 num_train_epochs: 2 lr_scheduler_type: cosine warmup_ratio: 0.01 pure_bf16: true flash_attn: fa2 ### eval val_size: 0.01 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 1000 ``` ### QLoRA+ ```yaml ### model model_name_or_path: orpo-chatml-badam ### method stage: sft do_train: true finetuning_type: lora lora_target: all loraplus_lr_ratio: 16.0 lora_rank: 8 lora_alpha: 16 use_unsloth: true quantization_bit: 4 upcast_layernorm: true seed: 31415 ### dataset dataset: orpo_sft_mix_40k template: hermes_chatml cutoff_len: 8192 overwrite_cache: false preprocessing_num_workers: 12 ### output output_dir: orpo-chatml-badam/loraplus logging_steps: 1 save_steps: 1 save_strategy: epoch plot_loss: true overwrite_output_dir: false ### train per_device_train_batch_size: 4 gradient_accumulation_steps: 4 learning_rate: 0.0001 num_train_epochs: 2.0 lr_scheduler_type: cosine warmup_ratio: 0.01 bf16: true flash_attn: fa2 ### eval val_size: 0.02 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 1000 ```