lucyknada commited on
Commit
e0e6b6b
·
verified ·
1 Parent(s): 945aaad

Upload ./README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +169 -0
README.md ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: other
4
+ base_model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
5
+ tags:
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: outputs/out
9
+ results: []
10
+ ---
11
+ ### exl2 quant (measurement.json in main branch)
12
+ ---
13
+ ### check revisions for quants
14
+ ---
15
+
16
+
17
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
+ should probably proofread and complete it, then remove this comment. -->
19
+
20
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
21
+ <details><summary>See axolotl config</summary>
22
+
23
+ axolotl version: `0.4.1`
24
+ ```yaml
25
+ base_model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
26
+ model_type: AutoModelForCausalLM
27
+ tokenizer_type: AutoTokenizer
28
+
29
+ load_in_8bit: false
30
+ load_in_4bit: false
31
+ strict: false
32
+
33
+ datasets:
34
+ - path: MangoHQ/Gryphe-3.5-16k-Subset
35
+ type: sharegpt
36
+ conversation: chatml
37
+ - path: Epiculous/Synthstruct-Gens-v1-Filtered-n-Cleaned
38
+ type: sharegpt
39
+ conversation: chatml
40
+ - path: anthracite-org/Stheno-Data-Filtered
41
+ type: sharegpt
42
+ conversation: chatml
43
+ - path: Epiculous/SynthRP-Gens-v1-Filtered-n-Cleaned
44
+ type: sharegpt
45
+ conversation: chatml
46
+ - path: lodrick-the-lafted/NopmWritingStruct
47
+ type: sharegpt
48
+ conversation: chatml
49
+ - path: anthracite-org/kalo-opus-instruct-22k-no-refusal
50
+ type: sharegpt
51
+ conversation: chatml
52
+
53
+ chat_template: chatml
54
+
55
+ val_set_size: 0.01
56
+ output_dir: ./outputs/out
57
+
58
+ adapter:
59
+ lora_r:
60
+ lora_alpha:
61
+ lora_dropout:
62
+ lora_target_linear:
63
+
64
+ sequence_len: 16384
65
+ # sequence_len: 32768
66
+ sample_packing: true
67
+ eval_sample_packing: false
68
+ pad_to_sequence_len: true
69
+
70
+ wandb_project: tinymagnumv2
71
+ wandb_entity:
72
+ wandb_watch:
73
+ wandb_name: tinymagnumv2
74
+ wandb_log_model:
75
+
76
+ gradient_accumulation_steps: 32
77
+ micro_batch_size: 1
78
+ num_epochs: 2
79
+ optimizer: adamw_bnb_8bit
80
+ lr_scheduler: cosine
81
+ learning_rate: 0.00002
82
+ weight_decay: 0.05
83
+
84
+ train_on_inputs: false
85
+ group_by_length: false
86
+ bf16: auto
87
+ fp16:
88
+ tf32: true
89
+
90
+ gradient_checkpointing: true
91
+ early_stopping_patience:
92
+ resume_from_checkpoint:
93
+ local_rank:
94
+ logging_steps: 1
95
+ xformers_attention:
96
+ flash_attention: true
97
+
98
+ warmup_ratio: 0.1
99
+ evals_per_epoch: 4
100
+ eval_table_size:
101
+ eval_max_new_tokens: 128
102
+ saves_per_epoch: 1
103
+
104
+ debug:
105
+ deepspeed:
106
+ fsdp:
107
+ fsdp_config:
108
+
109
+ special_tokens:
110
+ pad_token: <|finetune_right_pad_id|>
111
+
112
+ ```
113
+
114
+ </details><br>
115
+
116
+ # outputs/out
117
+
118
+ This model is a fine-tuned version of [IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml](https://huggingface.co/IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml) on the None dataset.
119
+ It achieves the following results on the evaluation set:
120
+ - Loss: 1.2014
121
+
122
+ ## Model description
123
+
124
+ More information needed
125
+
126
+ ## Intended uses & limitations
127
+
128
+ More information needed
129
+
130
+ ## Training and evaluation data
131
+
132
+ More information needed
133
+
134
+ ## Training procedure
135
+
136
+ ### Training hyperparameters
137
+
138
+ The following hyperparameters were used during training:
139
+ - learning_rate: 2e-05
140
+ - train_batch_size: 1
141
+ - eval_batch_size: 1
142
+ - seed: 42
143
+ - gradient_accumulation_steps: 32
144
+ - total_train_batch_size: 32
145
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
146
+ - lr_scheduler_type: cosine
147
+ - lr_scheduler_warmup_steps: 36
148
+ - num_epochs: 2
149
+
150
+ ### Training results
151
+
152
+ | Training Loss | Epoch | Step | Validation Loss |
153
+ |:-------------:|:------:|:----:|:---------------:|
154
+ | 1.6733 | 0.0051 | 1 | 1.6590 |
155
+ | 1.4425 | 0.2523 | 49 | 1.3040 |
156
+ | 1.3564 | 0.5047 | 98 | 1.2451 |
157
+ | 1.333 | 0.7570 | 147 | 1.2201 |
158
+ | 1.2936 | 1.0093 | 196 | 1.2077 |
159
+ | 1.2235 | 1.2462 | 245 | 1.2041 |
160
+ | 1.2651 | 1.4986 | 294 | 1.2018 |
161
+ | 1.238 | 1.7509 | 343 | 1.2014 |
162
+
163
+
164
+ ### Framework versions
165
+
166
+ - Transformers 4.45.0.dev0
167
+ - Pytorch 2.4.0+cu121
168
+ - Datasets 2.20.0
169
+ - Tokenizers 0.19.1