justinj92 commited on
Commit
e42e013
1 Parent(s): 2f21647

End of training

Browse files
Files changed (3) hide show
  1. README.md +188 -0
  2. adapter_model.bin +3 -0
  3. adapter_model.safetensors +1 -1
README.md ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: peft
4
+ tags:
5
+ - axolotl
6
+ - generated_from_trainer
7
+ base_model: microsoft/phi-2
8
+ model-index:
9
+ - name: phi2-bunny
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.4.0`
20
+ ```yaml
21
+ base_model: microsoft/phi-2
22
+ model_type: AutoModelForCausalLM
23
+ tokenizer_type: AutoTokenizer
24
+ is_llama_derived_model: false
25
+ # trust_remote_code: true
26
+
27
+ load_in_8bit: false
28
+ load_in_4bit: false
29
+ strict: false
30
+
31
+ datasets:
32
+ - path: WhiteRabbitNeo/WRN-Chapter-1
33
+ type:
34
+ system_prompt: ""
35
+ field_system: system
36
+ field_instruction: instruction
37
+ field_output: response
38
+ prompt_style: chatml
39
+ - path: WhiteRabbitNeo/WRN-Chapter-2
40
+ type:
41
+ system_prompt: ""
42
+ field_system: system
43
+ field_instruction: instruction
44
+ field_output: response
45
+ prompt_style: chatml
46
+
47
+ dataset_prepared_path: ./phi2-bunny/last-run-prepared
48
+ val_set_size: 0.05
49
+ output_dir: ./phi2-bunny/
50
+
51
+ sequence_len: 2048
52
+ sample_packing: true
53
+ pad_to_sequence_len: true
54
+
55
+ adapter: lora
56
+ lora_model_dir:
57
+ lora_r: 64
58
+ lora_alpha: 32
59
+ lora_dropout: 0.05
60
+ lora_target_linear: true
61
+ lora_fan_in_fan_out:
62
+ lora_modules_to_save:
63
+ - embed_tokens
64
+ - lm_head
65
+
66
+
67
+ hub_model_id: justinj92/phi2-bunny
68
+
69
+ wandb_project: phi2-bunny
70
+ wandb_entity: justinjoy-5
71
+ wandb_watch:
72
+ wandb_name:
73
+ wandb_log_model:
74
+
75
+ gradient_accumulation_steps: 8
76
+ micro_batch_size: 2
77
+ num_epochs: 5
78
+ optimizer: paged_adamw_8bit
79
+ adam_beta1: 0.9
80
+ adam_beta2: 0.999
81
+ adam_epsilon: 0.00001
82
+ max_grad_norm: 1000.0
83
+ lr_scheduler: cosine
84
+ learning_rate: 0.0002
85
+
86
+ train_on_inputs: false
87
+ group_by_length: true
88
+ bf16: true
89
+ fp16: false
90
+ tf32: true
91
+
92
+ gradient_checkpointing: true
93
+ early_stopping_patience:
94
+ resume_from_checkpoint:
95
+ auto_resume_from_checkpoints:
96
+ local_rank:
97
+ logging_steps: 1
98
+ xformers_attention:
99
+ flash_attention: true
100
+ chat_template: chatml
101
+
102
+ warmup_steps: 100
103
+ evals_per_epoch: 4
104
+ save_steps: 0.01
105
+ save_total_limit: 2
106
+ debug:
107
+ deepspeed:
108
+ weight_decay: 0.01
109
+ fsdp:
110
+ fsdp_config:
111
+ resize_token_embeddings_to_32x: true
112
+ special_tokens:
113
+ eos_token: "<|im_end|>"
114
+ pad_token: "<|endoftext|>"
115
+ tokens:
116
+ - "<|im_start|>"
117
+
118
+ ```
119
+
120
+ </details><br>
121
+
122
+ # phi2-bunny
123
+
124
+ This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the None dataset.
125
+ It achieves the following results on the evaluation set:
126
+ - Loss: 0.5347
127
+
128
+ ## Model description
129
+
130
+ More information needed
131
+
132
+ ## Intended uses & limitations
133
+
134
+ More information needed
135
+
136
+ ## Training and evaluation data
137
+
138
+ More information needed
139
+
140
+ ## Training procedure
141
+
142
+ ### Training hyperparameters
143
+
144
+ The following hyperparameters were used during training:
145
+ - learning_rate: 0.0002
146
+ - train_batch_size: 2
147
+ - eval_batch_size: 2
148
+ - seed: 42
149
+ - gradient_accumulation_steps: 8
150
+ - total_train_batch_size: 16
151
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-05
152
+ - lr_scheduler_type: cosine
153
+ - lr_scheduler_warmup_steps: 100
154
+ - num_epochs: 5
155
+
156
+ ### Training results
157
+
158
+ | Training Loss | Epoch | Step | Validation Loss |
159
+ |:-------------:|:-----:|:----:|:---------------:|
160
+ | 0.8645 | 0.0 | 1 | 0.7932 |
161
+ | 0.6246 | 0.25 | 228 | 0.6771 |
162
+ | 0.6449 | 0.5 | 456 | 0.6186 |
163
+ | 0.6658 | 0.75 | 684 | 0.6073 |
164
+ | 0.5419 | 1.0 | 912 | 0.5911 |
165
+ | 0.5477 | 1.24 | 1140 | 0.5878 |
166
+ | 0.612 | 1.49 | 1368 | 0.5715 |
167
+ | 0.6328 | 1.74 | 1596 | 0.5632 |
168
+ | 0.5082 | 1.99 | 1824 | 0.5534 |
169
+ | 0.5807 | 2.24 | 2052 | 0.5513 |
170
+ | 0.4775 | 2.49 | 2280 | 0.5448 |
171
+ | 0.514 | 2.74 | 2508 | 0.5430 |
172
+ | 0.4943 | 2.99 | 2736 | 0.5398 |
173
+ | 0.5012 | 3.22 | 2964 | 0.5396 |
174
+ | 0.5203 | 3.48 | 3192 | 0.5371 |
175
+ | 0.5112 | 3.73 | 3420 | 0.5356 |
176
+ | 0.4978 | 3.98 | 3648 | 0.5351 |
177
+ | 0.5642 | 4.22 | 3876 | 0.5348 |
178
+ | 0.5383 | 4.47 | 4104 | 0.5348 |
179
+ | 0.4679 | 4.72 | 4332 | 0.5347 |
180
+
181
+
182
+ ### Framework versions
183
+
184
+ - PEFT 0.8.1.dev0
185
+ - Transformers 4.37.0
186
+ - Pytorch 2.1.2+cu121
187
+ - Datasets 2.16.1
188
+ - Tokenizers 0.15.0
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bbacf6cea80132a5062734e8ca0ab079bb429cf8fc25f9bb3ccf88f2f9460f6a
3
+ size 713273302
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5138a9a038c69808a83a116532a61b363b90b66dc49ea8d03c26eb9b03c1b3a4
3
  size 713186160
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a0d73dac9ffc07bcdc67f73f41aae5372826874ab3866663761ab06f3320ad1
3
  size 713186160