SystemAdmin123 commited on
Commit
2461514
·
verified ·
1 Parent(s): e61e32d

End of training

Browse files
Files changed (2) hide show
  1. README.md +47 -27
  2. adapter_model.bin +3 -0
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
- library_name: transformers
3
- base_model: trl-internal-testing/tiny-random-LlamaForCausalLM
4
  tags:
5
  - axolotl
6
  - generated_from_trainer
@@ -19,8 +19,9 @@ should probably proofread and complete it, then remove this comment. -->
19
 
20
  axolotl version: `0.6.0`
21
  ```yaml
22
- base_model: trl-internal-testing/tiny-random-LlamaForCausalLM
23
- batch_size: 128
 
24
  bf16: true
25
  chat_template: tokenizer_default_fallback_alpaca
26
  datasets:
@@ -36,36 +37,42 @@ datasets:
36
  system_prompt: ''
37
  device_map: auto
38
  eval_sample_packing: false
39
- eval_steps: 200
40
  flash_attention: true
41
  gradient_checkpointing: true
42
  group_by_length: true
43
  hub_model_id: SystemAdmin123/test-repo
44
  hub_strategy: checkpoint
45
- learning_rate: 0.0002
46
  logging_steps: 10
 
 
 
 
47
  lr_scheduler: cosine
48
- max_steps: 10000
49
- micro_batch_size: 32
50
  model_type: AutoModelForCausalLM
51
- num_epochs: 100
52
  optimizer: adamw_bnb_8bit
53
  output_dir: /root/.sn56/axolotl/tmp/test-repo
54
  pad_to_sequence_len: true
55
  resize_token_embeddings_to_32x: false
56
  sample_packing: true
57
- save_steps: 200
58
  save_total_limit: 1
59
  sequence_len: 2048
60
- tokenizer_type: LlamaTokenizerFast
61
  torch_dtype: bf16
62
  training_args_kwargs:
 
63
  hub_private_repo: true
 
64
  trust_remote_code: true
65
- val_set_size: 0.1
66
  wandb_entity: ''
67
  wandb_mode: online
68
- wandb_name: trl-internal-testing/tiny-random-LlamaForCausalLM-argilla/databricks-dolly-15k-curated-en
69
  wandb_project: Gradients-On-Demand
70
  wandb_run: your_name
71
  wandb_runid: default
@@ -77,7 +84,9 @@ warmup_ratio: 0.05
77
 
78
  # test-repo
79
 
80
- This model is a fine-tuned version of [trl-internal-testing/tiny-random-LlamaForCausalLM](https://huggingface.co/trl-internal-testing/tiny-random-LlamaForCausalLM) on the argilla/databricks-dolly-15k-curated-en dataset.
 
 
81
 
82
  ## Model description
83
 
@@ -96,29 +105,40 @@ More information needed
96
  ### Training hyperparameters
97
 
98
  The following hyperparameters were used during training:
99
- - learning_rate: 0.0002
100
- - train_batch_size: 32
101
- - eval_batch_size: 32
102
  - seed: 42
103
  - distributed_type: multi-GPU
104
  - num_devices: 4
105
- - total_train_batch_size: 128
106
- - total_eval_batch_size: 128
107
- - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
108
  - lr_scheduler_type: cosine
109
- - lr_scheduler_warmup_steps: 5
110
- - training_steps: 100
111
 
112
  ### Training results
113
 
114
  | Training Loss | Epoch | Step | Validation Loss |
115
  |:-------------:|:------:|:----:|:---------------:|
116
- | No log | 0.1667 | 1 | 10.3764 |
 
 
 
 
 
 
 
 
 
 
117
 
118
 
119
  ### Framework versions
120
 
121
- - Transformers 4.48.1
122
- - Pytorch 2.5.1+cu124
123
- - Datasets 3.2.0
124
- - Tokenizers 0.21.0
 
 
1
  ---
2
+ library_name: peft
3
+ base_model: peft-internal-testing/tiny-dummy-qwen2
4
  tags:
5
  - axolotl
6
  - generated_from_trainer
 
19
 
20
  axolotl version: `0.6.0`
21
  ```yaml
22
+ adapter: lora
23
+ base_model: peft-internal-testing/tiny-dummy-qwen2
24
+ batch_size: 64
25
  bf16: true
26
  chat_template: tokenizer_default_fallback_alpaca
27
  datasets:
 
37
  system_prompt: ''
38
  device_map: auto
39
  eval_sample_packing: false
40
+ eval_steps: 0.1
41
  flash_attention: true
42
  gradient_checkpointing: true
43
  group_by_length: true
44
  hub_model_id: SystemAdmin123/test-repo
45
  hub_strategy: checkpoint
46
+ learning_rate: 0.0001
47
  logging_steps: 10
48
+ lora_alpha: 256
49
+ lora_dropout: 0.1
50
+ lora_r: 128
51
+ lora_target_linear: true
52
  lr_scheduler: cosine
53
+ max_steps: 160.0
54
+ micro_batch_size: 7
55
  model_type: AutoModelForCausalLM
56
+ num_epochs: 10000
57
  optimizer: adamw_bnb_8bit
58
  output_dir: /root/.sn56/axolotl/tmp/test-repo
59
  pad_to_sequence_len: true
60
  resize_token_embeddings_to_32x: false
61
  sample_packing: true
62
+ save_steps: 40
63
  save_total_limit: 1
64
  sequence_len: 2048
65
+ tokenizer_type: Qwen2TokenizerFast
66
  torch_dtype: bf16
67
  training_args_kwargs:
68
+ disable_tqdm: true
69
  hub_private_repo: true
70
+ save_only_model: true
71
  trust_remote_code: true
72
+ val_set_size: 0.01
73
  wandb_entity: ''
74
  wandb_mode: online
75
+ wandb_name: peft-internal-testing/tiny-dummy-qwen2-argilla/databricks-dolly-15k-curated-en
76
  wandb_project: Gradients-On-Demand
77
  wandb_run: your_name
78
  wandb_runid: default
 
84
 
85
  # test-repo
86
 
87
+ This model is a fine-tuned version of [peft-internal-testing/tiny-dummy-qwen2](https://huggingface.co/peft-internal-testing/tiny-dummy-qwen2) on the argilla/databricks-dolly-15k-curated-en dataset.
88
+ It achieves the following results on the evaluation set:
89
+ - Loss: 11.9145
90
 
91
  ## Model description
92
 
 
105
  ### Training hyperparameters
106
 
107
  The following hyperparameters were used during training:
108
+ - learning_rate: 0.0001
109
+ - train_batch_size: 7
110
+ - eval_batch_size: 7
111
  - seed: 42
112
  - distributed_type: multi-GPU
113
  - num_devices: 4
114
+ - total_train_batch_size: 28
115
+ - total_eval_batch_size: 28
116
+ - optimizer: Use adamw_bnb_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
117
  - lr_scheduler_type: cosine
118
+ - lr_scheduler_warmup_steps: 8
119
+ - training_steps: 160
120
 
121
  ### Training results
122
 
123
  | Training Loss | Epoch | Step | Validation Loss |
124
  |:-------------:|:------:|:----:|:---------------:|
125
+ | No log | 0.0345 | 1 | 11.9318 |
126
+ | 11.9309 | 0.5517 | 16 | 11.9279 |
127
+ | 11.9236 | 1.1034 | 32 | 11.9212 |
128
+ | 11.9214 | 1.6552 | 48 | 11.9193 |
129
+ | 11.9198 | 2.2069 | 64 | 11.9178 |
130
+ | 11.9188 | 2.7586 | 80 | 11.9161 |
131
+ | 11.9181 | 3.3103 | 96 | 11.9151 |
132
+ | 11.9175 | 3.8621 | 112 | 11.9147 |
133
+ | 11.9174 | 4.4138 | 128 | 11.9143 |
134
+ | 11.917 | 4.9655 | 144 | 11.9140 |
135
+ | 11.9169 | 5.5172 | 160 | 11.9145 |
136
 
137
 
138
  ### Framework versions
139
 
140
+ - PEFT 0.14.0
141
+ - Transformers 4.47.1
142
+ - Pytorch 2.3.1+cu121
143
+ - Datasets 3.1.0
144
+ - Tokenizers 0.21.0
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70c7833783fa3fc74e85a17679bf42d7fc78f79ffd5be3a976fc6ea108a186a6
3
+ size 190338