trollek
/

danube2-1.8b-Neural

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

trollek commited on Jul 7

Commit

0cc5ff4

•

1 Parent(s): e6de7cb

Update README.md

Files changed (1) hide show

README.md +18 -57

README.md CHANGED Viewed

@@ -6,21 +6,24 @@ language:
 - en
 library_name: transformers
 base_model: h2oai/h2o-danube2-1.8b-base
 ---
-# danube2-1.8b-ORPO
-ChatML tokens are added and first fine-tuned with BAdam and then QLoRA+ on mlabonne/orpo-dpo-mix-40k, but as SFT and not DPO, and using LLama-Factory.
 ## Template
 ```jinja
-<|im_start>user
 {{instruction}}<|im_end|>
-<|im_start>assistant
-{{response}}<|im_end>
 ```
-## BAdam
 ```yaml
 ### model
@@ -40,7 +43,7 @@ seed: 314
 ### dataset
 dataset: orpo_sft_mix_40k
-template: ninja_chatml
 cutoff_len: 8192
 overwrite_cache: false
 preprocessing_num_workers: 12
@@ -70,55 +73,13 @@ eval_strategy: steps
 eval_steps: 1000
 ```
-### QLoRA+
-```yaml
-### model
-model_name_or_path: orpo-chatml-badam
-### method
-stage: sft
-do_train: true
-finetuning_type: lora
-lora_target: all
-loraplus_lr_ratio: 16.0
-lora_rank: 8
-lora_alpha: 16
-use_unsloth: true
-quantization_bit: 4
-upcast_layernorm: true
-seed: 31415
-### dataset
-dataset: orpo_sft_mix_40k
-template: hermes_chatml
-cutoff_len: 8192
-overwrite_cache: false
-preprocessing_num_workers: 12
-### output
-output_dir: orpo-chatml-badam/loraplus
-logging_steps: 1
-save_steps: 1
-save_strategy: epoch
-plot_loss: true
-overwrite_output_dir: false
-### train
-per_device_train_batch_size: 4
-gradient_accumulation_steps: 4
-learning_rate: 0.0001
-num_train_epochs: 2.0
-lr_scheduler_type: cosine
-warmup_ratio: 0.01
-bf16: true
-flash_attn: fa2
-### eval
-val_size: 0.02
-per_device_eval_batch_size: 1
-eval_strategy: steps
-eval_steps: 1000
-```

 - en
 library_name: transformers
 base_model: h2oai/h2o-danube2-1.8b-base
+tags:
+- llama-factory
+- unsloth
 ---
+# h2o-danube2 with ChatML template
+This model was first fine-tuned with [BAdam](https://arxiv.org/abs/2404.02827 "BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models") on [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k), but as SFT and not DPO, using LLama-Factory.
 ## Template
 ```jinja
+<|im_start|>user
 {{instruction}}<|im_end|>
+<|im_start|>assistant
+{{response}}<|im_end|>
 ```
+## BAdam config
 ```yaml
 ### model
 ### dataset
 dataset: orpo_sft_mix_40k
+template: hermes_chatml
 cutoff_len: 8192
 overwrite_cache: false
 preprocessing_num_workers: 12
 eval_steps: 1000
 ```
+### BAdam training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 0.7474        | 0.3653 | 1000 | 0.8887          |
+| 0.9106        | 0.7306 | 2000 | 0.8681          |
+| 0.8121        | 1.0958 | 3000 | 0.8635          |
+| 0.8636        | 1.4611 | 4000 | 0.8562          |
+| 0.8           | 1.8264 | 5000 | 0.8565          |