minpeter commited on
Commit
676aeca
·
verified ·
1 Parent(s): 7939468

End of training

Browse files
Files changed (2) hide show
  1. README.md +68 -61
  2. pytorch_model.bin +3 -0
README.md CHANGED
@@ -6,7 +6,7 @@ tags:
6
  - axolotl
7
  - generated_from_trainer
8
  datasets:
9
- - alpaca_data.json
10
  model-index:
11
  - name: Alpaca-Llama-3.2-1B-Instruct
12
  results: []
@@ -20,71 +20,76 @@ should probably proofread and complete it, then remove this comment. -->
20
 
21
  axolotl version: `0.6.0`
22
  ```yaml
23
- base_model: meta-llama/Llama-3.2-1B
24
- model_type: LlamaForCausalLM
25
- tokenizer_type: PreTrainedTokenizerFast
26
-
27
- strict: false
28
-
29
- save_safetensors: true
30
- flash_attention: true
31
-
32
- auto_resume_from_checkpoints: true
33
- save_steps: 100
34
 
35
- learning_rate: 5e-4
36
- num_epochs: 3
37
- micro_batch_size: 8
38
- gradient_accumulation_steps: 4
39
- optimizer: adamw_bnb_8bit
40
- lr_scheduler: cosine
41
 
 
42
  hub_model_id: minpeter/Alpaca-Llama-3.2-1B-Instruct
43
 
44
- dataset_processes: 5000
45
-
46
- chat_template: jinja
47
- chat_template_jinja: |-
48
- {%- for message in messages %}
49
- {%- if message['role'] in ['user', 'assistant'] %}
50
- {{- '<|' + message['role'] + '|>\n' }}
51
- {{- message['content'] + '\n' }}
52
- {%- else %}
53
- {{- raise_exception('Invalid role: ' + message['role']) }}
54
- {%- endif %}
55
- {%- endfor %}
56
- {%- if add_generation_prompt %}
57
- {{- '<|assistant|>\n' }}
58
- {%- endif %}
59
 
60
  datasets:
61
- - path: alpaca_data.json
62
- type:
63
- field_instruction: instruction
64
- field_input: input
65
- field_output: output
66
- format: |
67
- <|user|>
68
- {instruction} {input}
69
- <|assistant|>
70
-
71
- no_input_format: |
72
- <|user|>
73
- {instruction}
74
- <|assistant|>
75
 
76
- special_tokens:
77
- pad_token: <pad>
 
78
 
79
  wandb_project: "axolotl"
80
  wandb_entity: "kasfiekfs-e"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  ```
82
 
83
  </details><br>
84
 
85
  # Alpaca-Llama-3.2-1B-Instruct
86
 
87
- This model is a fine-tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) on the alpaca_data.json dataset.
 
 
88
 
89
  ## Model description
90
 
@@ -103,22 +108,24 @@ More information needed
103
  ### Training hyperparameters
104
 
105
  The following hyperparameters were used during training:
106
- - learning_rate: 0.0005
107
- - train_batch_size: 8
108
- - eval_batch_size: 8
109
  - seed: 42
110
- - distributed_type: multi-GPU
111
- - num_devices: 8
112
- - gradient_accumulation_steps: 4
113
- - total_train_batch_size: 256
114
- - total_eval_batch_size: 64
115
- - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
116
  - lr_scheduler_type: cosine
117
- - lr_scheduler_warmup_steps: 18
118
- - num_epochs: 3
119
 
120
  ### Training results
121
 
 
 
 
 
 
122
 
123
 
124
  ### Framework versions
 
6
  - axolotl
7
  - generated_from_trainer
8
  datasets:
9
+ - tatsu-lab/alpaca
10
  model-index:
11
  - name: Alpaca-Llama-3.2-1B-Instruct
12
  results: []
 
20
 
21
  axolotl version: `0.6.0`
22
  ```yaml
 
 
 
 
 
 
 
 
 
 
 
23
 
 
 
 
 
 
 
24
 
25
+ base_model: meta-llama/Llama-3.2-1B
26
  hub_model_id: minpeter/Alpaca-Llama-3.2-1B-Instruct
27
 
28
+ load_in_8bit: false
29
+ load_in_4bit: false
30
+ strict: false
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  datasets:
33
+ - path: tatsu-lab/alpaca
34
+ type: alpaca
35
+ dataset_prepared_path: last_run_prepared
36
+ dataset_processes: 1000
37
+ val_set_size: 0.05
38
+ output_dir: ./outputs/out
 
 
 
 
 
 
 
 
39
 
40
+ sequence_len: 8192
41
+ sample_packing: true
42
+ pad_to_sequence_len: true
43
 
44
  wandb_project: "axolotl"
45
  wandb_entity: "kasfiekfs-e"
46
+ wandb_watch:
47
+ wandb_name:
48
+ wandb_log_model:
49
+
50
+ gradient_accumulation_steps: 8
51
+ micro_batch_size: 1
52
+ num_epochs: 1
53
+ optimizer: paged_adamw_8bit
54
+ lr_scheduler: cosine
55
+ learning_rate: 2e-5
56
+
57
+ train_on_inputs: false
58
+ group_by_length: false
59
+ bf16: auto
60
+ fp16:
61
+ tf32: false
62
+
63
+ gradient_checkpointing: true
64
+ gradient_checkpointing_kwargs:
65
+ use_reentrant: false
66
+ early_stopping_patience:
67
+ resume_from_checkpoint:
68
+ logging_steps: 1
69
+ xformers_attention:
70
+ flash_attention: true
71
+
72
+ warmup_steps: 100
73
+ evals_per_epoch: 2
74
+ eval_table_size:
75
+ saves_per_epoch: 1
76
+ debug:
77
+ deepspeed:
78
+ weight_decay: 0.0
79
+ fsdp:
80
+ fsdp_config:
81
+ special_tokens:
82
+ pad_token: <|end_of_text|>
83
+
84
  ```
85
 
86
  </details><br>
87
 
88
  # Alpaca-Llama-3.2-1B-Instruct
89
 
90
+ This model is a fine-tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) on the tatsu-lab/alpaca dataset.
91
+ It achieves the following results on the evaluation set:
92
+ - Loss: 1.3881
93
 
94
  ## Model description
95
 
 
108
  ### Training hyperparameters
109
 
110
  The following hyperparameters were used during training:
111
+ - learning_rate: 2e-05
112
+ - train_batch_size: 1
113
+ - eval_batch_size: 1
114
  - seed: 42
115
+ - gradient_accumulation_steps: 8
116
+ - total_train_batch_size: 8
117
+ - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 
 
 
118
  - lr_scheduler_type: cosine
119
+ - lr_scheduler_warmup_steps: 100
120
+ - num_epochs: 1
121
 
122
  ### Training results
123
 
124
+ | Training Loss | Epoch | Step | Validation Loss |
125
+ |:-------------:|:------:|:----:|:---------------:|
126
+ | 1.5628 | 0.0127 | 1 | 1.5941 |
127
+ | 1.4085 | 0.4960 | 39 | 1.4333 |
128
+ | 1.3727 | 0.9921 | 78 | 1.3881 |
129
 
130
 
131
  ### Framework versions
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:547a2ac73e4b7254f8b1ce78c65d9fd6bf777565ee77965b4fb5ff67e56ba14e
3
+ size 2471678226