SADATO commited on
Commit
85c17f1
1 Parent(s): e3ecda1

Upload 11 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,204 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: pythainlp/wangchanglm-7.5B-sft-enth
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+
201
+
202
+ ### Framework versions
203
+
204
+ - PEFT 0.8.1
adapter_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "pythainlp/wangchanglm-7.5B-sft-enth",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layers_pattern": null,
10
+ "layers_to_transform": null,
11
+ "loftq_config": {},
12
+ "lora_alpha": 16,
13
+ "lora_dropout": 0.05,
14
+ "megatron_config": null,
15
+ "megatron_core": "megatron.core",
16
+ "modules_to_save": null,
17
+ "peft_type": "LORA",
18
+ "r": 32,
19
+ "rank_pattern": {},
20
+ "revision": null,
21
+ "target_modules": [
22
+ "v_proj",
23
+ "fc2",
24
+ "out_proj",
25
+ "fc1",
26
+ "k_proj",
27
+ "q_proj"
28
+ ],
29
+ "task_type": "CAUSAL_LM",
30
+ "use_rslora": false
31
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1835a295d9cb7eb5d011a1f32fbbb0f56ce90cc9d9443c620d84c86534cbfc5
3
+ size 151045824
all_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "train_loss": 1.371388260229553,
4
+ "train_runtime": 75147.4617,
5
+ "train_samples_per_second": 5.989,
6
+ "train_steps_per_second": 0.187
7
+ }
sentencepiece.bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c49dc7e82c10227af764e518924cf2f9d50c00462750d184fa74697bba65eef8
3
+ size 4920706
special_tokens_map.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<madeupword0>",
4
+ "<madeupword1>",
5
+ "<madeupword2>",
6
+ "<madeupword3>",
7
+ "<madeupword4>",
8
+ "<madeupword5>",
9
+ "<madeupword6>"
10
+ ],
11
+ "bos_token": {
12
+ "content": "<s>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "cls_token": {
19
+ "content": "<s>",
20
+ "lstrip": false,
21
+ "normalized": false,
22
+ "rstrip": false,
23
+ "single_word": false
24
+ },
25
+ "eos_token": {
26
+ "content": "</s>",
27
+ "lstrip": false,
28
+ "normalized": false,
29
+ "rstrip": false,
30
+ "single_word": false
31
+ },
32
+ "pad_token": "</s>",
33
+ "sep_token": {
34
+ "content": "</s>",
35
+ "lstrip": false,
36
+ "normalized": false,
37
+ "rstrip": false,
38
+ "single_word": false
39
+ },
40
+ "unk_token": {
41
+ "content": "<unk>",
42
+ "lstrip": false,
43
+ "normalized": false,
44
+ "rstrip": false,
45
+ "single_word": false
46
+ }
47
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cce87967248ba9d76846907a93fd07dcf8661877eb8f56efd5ed1295e5826506
3
+ size 17210232
tokenizer_config.json ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "256001": {
36
+ "content": "<madeupword0>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "256002": {
44
+ "content": "<madeupword1>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "256003": {
52
+ "content": "<madeupword2>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "256004": {
60
+ "content": "<madeupword3>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "256005": {
68
+ "content": "<madeupword4>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "256006": {
76
+ "content": "<madeupword5>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "256007": {
84
+ "content": "<madeupword6>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ }
91
+ },
92
+ "additional_special_tokens": [
93
+ "<madeupword0>",
94
+ "<madeupword1>",
95
+ "<madeupword2>",
96
+ "<madeupword3>",
97
+ "<madeupword4>",
98
+ "<madeupword5>",
99
+ "<madeupword6>"
100
+ ],
101
+ "bos_token": "<s>",
102
+ "clean_up_tokenization_spaces": true,
103
+ "cls_token": "<s>",
104
+ "eos_token": "</s>",
105
+ "model_max_length": 1000000000000000019884624838656,
106
+ "pad_token": "</s>",
107
+ "sep_token": "</s>",
108
+ "sp_model_kwargs": {},
109
+ "tokenizer_class": "XGLMTokenizer",
110
+ "unk_token": "<unk>"
111
+ }
train_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "train_loss": 1.371388260229553,
4
+ "train_runtime": 75147.4617,
5
+ "train_samples_per_second": 5.989,
6
+ "train_steps_per_second": 0.187
7
+ }
trainer_state.json ADDED
@@ -0,0 +1,4328 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 1.214709758758545,
3
+ "best_model_checkpoint": "model/E5/wangchanglm_E5_wangchanglm_shuffle_augment_gpt4/checkpoint-12658",
4
+ "epoch": 9.99666718510454,
5
+ "eval_steps": 500,
6
+ "global_step": 14060,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.01,
13
+ "learning_rate": 2.777777777777778e-05,
14
+ "loss": 3.5505,
15
+ "step": 20
16
+ },
17
+ {
18
+ "epoch": 0.03,
19
+ "learning_rate": 4.999998996341642e-05,
20
+ "loss": 2.8999,
21
+ "step": 40
22
+ },
23
+ {
24
+ "epoch": 0.04,
25
+ "learning_rate": 4.999963868383706e-05,
26
+ "loss": 2.1883,
27
+ "step": 60
28
+ },
29
+ {
30
+ "epoch": 0.06,
31
+ "learning_rate": 4.9998785583137e-05,
32
+ "loss": 2.045,
33
+ "step": 80
34
+ },
35
+ {
36
+ "epoch": 0.07,
37
+ "learning_rate": 4.999743067844064e-05,
38
+ "loss": 2.0015,
39
+ "step": 100
40
+ },
41
+ {
42
+ "epoch": 0.09,
43
+ "learning_rate": 4.999557399694518e-05,
44
+ "loss": 1.9891,
45
+ "step": 120
46
+ },
47
+ {
48
+ "epoch": 0.1,
49
+ "learning_rate": 4.9993215575920024e-05,
50
+ "loss": 1.9474,
51
+ "step": 140
52
+ },
53
+ {
54
+ "epoch": 0.11,
55
+ "learning_rate": 4.999035546270608e-05,
56
+ "loss": 1.9158,
57
+ "step": 160
58
+ },
59
+ {
60
+ "epoch": 0.13,
61
+ "learning_rate": 4.998699371471479e-05,
62
+ "loss": 1.9258,
63
+ "step": 180
64
+ },
65
+ {
66
+ "epoch": 0.14,
67
+ "learning_rate": 4.9983130399426966e-05,
68
+ "loss": 1.8942,
69
+ "step": 200
70
+ },
71
+ {
72
+ "epoch": 0.16,
73
+ "learning_rate": 4.9978765594391474e-05,
74
+ "loss": 1.8836,
75
+ "step": 220
76
+ },
77
+ {
78
+ "epoch": 0.17,
79
+ "learning_rate": 4.9973899387223616e-05,
80
+ "loss": 1.8792,
81
+ "step": 240
82
+ },
83
+ {
84
+ "epoch": 0.18,
85
+ "learning_rate": 4.996853187560343e-05,
86
+ "loss": 1.8481,
87
+ "step": 260
88
+ },
89
+ {
90
+ "epoch": 0.2,
91
+ "learning_rate": 4.996266316727371e-05,
92
+ "loss": 1.8589,
93
+ "step": 280
94
+ },
95
+ {
96
+ "epoch": 0.21,
97
+ "learning_rate": 4.995629338003782e-05,
98
+ "loss": 1.8073,
99
+ "step": 300
100
+ },
101
+ {
102
+ "epoch": 0.23,
103
+ "learning_rate": 4.994942264175737e-05,
104
+ "loss": 1.8313,
105
+ "step": 320
106
+ },
107
+ {
108
+ "epoch": 0.24,
109
+ "learning_rate": 4.9942051090349606e-05,
110
+ "loss": 1.7953,
111
+ "step": 340
112
+ },
113
+ {
114
+ "epoch": 0.26,
115
+ "learning_rate": 4.9934178873784674e-05,
116
+ "loss": 1.7668,
117
+ "step": 360
118
+ },
119
+ {
120
+ "epoch": 0.27,
121
+ "learning_rate": 4.992580615008264e-05,
122
+ "loss": 1.778,
123
+ "step": 380
124
+ },
125
+ {
126
+ "epoch": 0.28,
127
+ "learning_rate": 4.991693308731033e-05,
128
+ "loss": 1.785,
129
+ "step": 400
130
+ },
131
+ {
132
+ "epoch": 0.3,
133
+ "learning_rate": 4.990755986357791e-05,
134
+ "loss": 1.7609,
135
+ "step": 420
136
+ },
137
+ {
138
+ "epoch": 0.31,
139
+ "learning_rate": 4.989768666703538e-05,
140
+ "loss": 1.7345,
141
+ "step": 440
142
+ },
143
+ {
144
+ "epoch": 0.33,
145
+ "learning_rate": 4.988731369586874e-05,
146
+ "loss": 1.7622,
147
+ "step": 460
148
+ },
149
+ {
150
+ "epoch": 0.34,
151
+ "learning_rate": 4.987644115829604e-05,
152
+ "loss": 1.7128,
153
+ "step": 480
154
+ },
155
+ {
156
+ "epoch": 0.36,
157
+ "learning_rate": 4.9865069272563195e-05,
158
+ "loss": 1.7156,
159
+ "step": 500
160
+ },
161
+ {
162
+ "epoch": 0.37,
163
+ "learning_rate": 4.98531982669396e-05,
164
+ "loss": 1.7354,
165
+ "step": 520
166
+ },
167
+ {
168
+ "epoch": 0.38,
169
+ "learning_rate": 4.9840828379713556e-05,
170
+ "loss": 1.7011,
171
+ "step": 540
172
+ },
173
+ {
174
+ "epoch": 0.4,
175
+ "learning_rate": 4.9827959859187476e-05,
176
+ "loss": 1.6927,
177
+ "step": 560
178
+ },
179
+ {
180
+ "epoch": 0.41,
181
+ "learning_rate": 4.9814592963672915e-05,
182
+ "loss": 1.6968,
183
+ "step": 580
184
+ },
185
+ {
186
+ "epoch": 0.43,
187
+ "learning_rate": 4.980072796148535e-05,
188
+ "loss": 1.682,
189
+ "step": 600
190
+ },
191
+ {
192
+ "epoch": 0.44,
193
+ "learning_rate": 4.978636513093887e-05,
194
+ "loss": 1.674,
195
+ "step": 620
196
+ },
197
+ {
198
+ "epoch": 0.46,
199
+ "learning_rate": 4.9771504760340494e-05,
200
+ "loss": 1.6618,
201
+ "step": 640
202
+ },
203
+ {
204
+ "epoch": 0.47,
205
+ "learning_rate": 4.975614714798445e-05,
206
+ "loss": 1.6696,
207
+ "step": 660
208
+ },
209
+ {
210
+ "epoch": 0.48,
211
+ "learning_rate": 4.9740292602146154e-05,
212
+ "loss": 1.6715,
213
+ "step": 680
214
+ },
215
+ {
216
+ "epoch": 0.5,
217
+ "learning_rate": 4.972394144107606e-05,
218
+ "loss": 1.6455,
219
+ "step": 700
220
+ },
221
+ {
222
+ "epoch": 0.51,
223
+ "learning_rate": 4.970709399299322e-05,
224
+ "loss": 1.6459,
225
+ "step": 720
226
+ },
227
+ {
228
+ "epoch": 0.53,
229
+ "learning_rate": 4.968975059607874e-05,
230
+ "loss": 1.6497,
231
+ "step": 740
232
+ },
233
+ {
234
+ "epoch": 0.54,
235
+ "learning_rate": 4.967191159846896e-05,
236
+ "loss": 1.6411,
237
+ "step": 760
238
+ },
239
+ {
240
+ "epoch": 0.55,
241
+ "learning_rate": 4.9653577358248484e-05,
242
+ "loss": 1.6355,
243
+ "step": 780
244
+ },
245
+ {
246
+ "epoch": 0.57,
247
+ "learning_rate": 4.9634748243442994e-05,
248
+ "loss": 1.6149,
249
+ "step": 800
250
+ },
251
+ {
252
+ "epoch": 0.58,
253
+ "learning_rate": 4.9615424632011857e-05,
254
+ "loss": 1.6107,
255
+ "step": 820
256
+ },
257
+ {
258
+ "epoch": 0.6,
259
+ "learning_rate": 4.959560691184052e-05,
260
+ "loss": 1.6145,
261
+ "step": 840
262
+ },
263
+ {
264
+ "epoch": 0.61,
265
+ "learning_rate": 4.957529548073276e-05,
266
+ "loss": 1.6122,
267
+ "step": 860
268
+ },
269
+ {
270
+ "epoch": 0.63,
271
+ "learning_rate": 4.9554490746402696e-05,
272
+ "loss": 1.6051,
273
+ "step": 880
274
+ },
275
+ {
276
+ "epoch": 0.64,
277
+ "learning_rate": 4.953319312646653e-05,
278
+ "loss": 1.5917,
279
+ "step": 900
280
+ },
281
+ {
282
+ "epoch": 0.65,
283
+ "learning_rate": 4.951140304843428e-05,
284
+ "loss": 1.59,
285
+ "step": 920
286
+ },
287
+ {
288
+ "epoch": 0.67,
289
+ "learning_rate": 4.948912094970113e-05,
290
+ "loss": 1.5809,
291
+ "step": 940
292
+ },
293
+ {
294
+ "epoch": 0.68,
295
+ "learning_rate": 4.946634727753864e-05,
296
+ "loss": 1.5851,
297
+ "step": 960
298
+ },
299
+ {
300
+ "epoch": 0.7,
301
+ "learning_rate": 4.9443082489085814e-05,
302
+ "loss": 1.5698,
303
+ "step": 980
304
+ },
305
+ {
306
+ "epoch": 0.71,
307
+ "learning_rate": 4.9419327051339883e-05,
308
+ "loss": 1.5785,
309
+ "step": 1000
310
+ },
311
+ {
312
+ "epoch": 0.73,
313
+ "learning_rate": 4.939508144114696e-05,
314
+ "loss": 1.5971,
315
+ "step": 1020
316
+ },
317
+ {
318
+ "epoch": 0.74,
319
+ "learning_rate": 4.937034614519245e-05,
320
+ "loss": 1.5689,
321
+ "step": 1040
322
+ },
323
+ {
324
+ "epoch": 0.75,
325
+ "learning_rate": 4.934512165999128e-05,
326
+ "loss": 1.5784,
327
+ "step": 1060
328
+ },
329
+ {
330
+ "epoch": 0.77,
331
+ "learning_rate": 4.931940849187795e-05,
332
+ "loss": 1.5687,
333
+ "step": 1080
334
+ },
335
+ {
336
+ "epoch": 0.78,
337
+ "learning_rate": 4.9293207156996354e-05,
338
+ "loss": 1.5659,
339
+ "step": 1100
340
+ },
341
+ {
342
+ "epoch": 0.8,
343
+ "learning_rate": 4.9266518181289414e-05,
344
+ "loss": 1.5564,
345
+ "step": 1120
346
+ },
347
+ {
348
+ "epoch": 0.81,
349
+ "learning_rate": 4.923934210048856e-05,
350
+ "loss": 1.5564,
351
+ "step": 1140
352
+ },
353
+ {
354
+ "epoch": 0.82,
355
+ "learning_rate": 4.921167946010291e-05,
356
+ "loss": 1.5437,
357
+ "step": 1160
358
+ },
359
+ {
360
+ "epoch": 0.84,
361
+ "learning_rate": 4.9183530815408386e-05,
362
+ "loss": 1.5503,
363
+ "step": 1180
364
+ },
365
+ {
366
+ "epoch": 0.85,
367
+ "learning_rate": 4.9154896731436526e-05,
368
+ "loss": 1.586,
369
+ "step": 1200
370
+ },
371
+ {
372
+ "epoch": 0.87,
373
+ "learning_rate": 4.9125777782963165e-05,
374
+ "loss": 1.5578,
375
+ "step": 1220
376
+ },
377
+ {
378
+ "epoch": 0.88,
379
+ "learning_rate": 4.909617455449689e-05,
380
+ "loss": 1.5513,
381
+ "step": 1240
382
+ },
383
+ {
384
+ "epoch": 0.9,
385
+ "learning_rate": 4.906608764026729e-05,
386
+ "loss": 1.5724,
387
+ "step": 1260
388
+ },
389
+ {
390
+ "epoch": 0.91,
391
+ "learning_rate": 4.903551764421307e-05,
392
+ "loss": 1.541,
393
+ "step": 1280
394
+ },
395
+ {
396
+ "epoch": 0.92,
397
+ "learning_rate": 4.900446517996987e-05,
398
+ "loss": 1.5477,
399
+ "step": 1300
400
+ },
401
+ {
402
+ "epoch": 0.94,
403
+ "learning_rate": 4.8972930870857994e-05,
404
+ "loss": 1.542,
405
+ "step": 1320
406
+ },
407
+ {
408
+ "epoch": 0.95,
409
+ "learning_rate": 4.89409153498699e-05,
410
+ "loss": 1.5279,
411
+ "step": 1340
412
+ },
413
+ {
414
+ "epoch": 0.97,
415
+ "learning_rate": 4.890841925965744e-05,
416
+ "loss": 1.5528,
417
+ "step": 1360
418
+ },
419
+ {
420
+ "epoch": 0.98,
421
+ "learning_rate": 4.8875443252519035e-05,
422
+ "loss": 1.5381,
423
+ "step": 1380
424
+ },
425
+ {
426
+ "epoch": 1.0,
427
+ "learning_rate": 4.884198799038652e-05,
428
+ "loss": 1.5313,
429
+ "step": 1400
430
+ },
431
+ {
432
+ "epoch": 1.0,
433
+ "eval_loss": 1.4408559799194336,
434
+ "eval_runtime": 293.7341,
435
+ "eval_samples_per_second": 18.939,
436
+ "eval_steps_per_second": 18.939,
437
+ "step": 1406
438
+ },
439
+ {
440
+ "epoch": 1.01,
441
+ "learning_rate": 4.880805414481189e-05,
442
+ "loss": 1.5191,
443
+ "step": 1420
444
+ },
445
+ {
446
+ "epoch": 1.02,
447
+ "learning_rate": 4.8773642396953796e-05,
448
+ "loss": 1.5047,
449
+ "step": 1440
450
+ },
451
+ {
452
+ "epoch": 1.04,
453
+ "learning_rate": 4.87387534375639e-05,
454
+ "loss": 1.5096,
455
+ "step": 1460
456
+ },
457
+ {
458
+ "epoch": 1.05,
459
+ "learning_rate": 4.8703387966973e-05,
460
+ "loss": 1.5209,
461
+ "step": 1480
462
+ },
463
+ {
464
+ "epoch": 1.07,
465
+ "learning_rate": 4.866754669507696e-05,
466
+ "loss": 1.4774,
467
+ "step": 1500
468
+ },
469
+ {
470
+ "epoch": 1.08,
471
+ "learning_rate": 4.8631230341322455e-05,
472
+ "loss": 1.5105,
473
+ "step": 1520
474
+ },
475
+ {
476
+ "epoch": 1.09,
477
+ "learning_rate": 4.859443963469256e-05,
478
+ "loss": 1.4994,
479
+ "step": 1540
480
+ },
481
+ {
482
+ "epoch": 1.11,
483
+ "learning_rate": 4.855717531369208e-05,
484
+ "loss": 1.4887,
485
+ "step": 1560
486
+ },
487
+ {
488
+ "epoch": 1.12,
489
+ "learning_rate": 4.851943812633279e-05,
490
+ "loss": 1.5248,
491
+ "step": 1580
492
+ },
493
+ {
494
+ "epoch": 1.14,
495
+ "learning_rate": 4.848122883011832e-05,
496
+ "loss": 1.4611,
497
+ "step": 1600
498
+ },
499
+ {
500
+ "epoch": 1.15,
501
+ "learning_rate": 4.844254819202904e-05,
502
+ "loss": 1.4756,
503
+ "step": 1620
504
+ },
505
+ {
506
+ "epoch": 1.17,
507
+ "learning_rate": 4.840339698850661e-05,
508
+ "loss": 1.4817,
509
+ "step": 1640
510
+ },
511
+ {
512
+ "epoch": 1.18,
513
+ "learning_rate": 4.836377600543842e-05,
514
+ "loss": 1.4862,
515
+ "step": 1660
516
+ },
517
+ {
518
+ "epoch": 1.19,
519
+ "learning_rate": 4.832368603814182e-05,
520
+ "loss": 1.4828,
521
+ "step": 1680
522
+ },
523
+ {
524
+ "epoch": 1.21,
525
+ "learning_rate": 4.8283127891348124e-05,
526
+ "loss": 1.4767,
527
+ "step": 1700
528
+ },
529
+ {
530
+ "epoch": 1.22,
531
+ "learning_rate": 4.824210237918649e-05,
532
+ "loss": 1.497,
533
+ "step": 1720
534
+ },
535
+ {
536
+ "epoch": 1.24,
537
+ "learning_rate": 4.820061032516756e-05,
538
+ "loss": 1.4843,
539
+ "step": 1740
540
+ },
541
+ {
542
+ "epoch": 1.25,
543
+ "learning_rate": 4.815865256216693e-05,
544
+ "loss": 1.4694,
545
+ "step": 1760
546
+ },
547
+ {
548
+ "epoch": 1.27,
549
+ "learning_rate": 4.811622993240844e-05,
550
+ "loss": 1.478,
551
+ "step": 1780
552
+ },
553
+ {
554
+ "epoch": 1.28,
555
+ "learning_rate": 4.807334328744726e-05,
556
+ "loss": 1.4594,
557
+ "step": 1800
558
+ },
559
+ {
560
+ "epoch": 1.29,
561
+ "learning_rate": 4.8029993488152806e-05,
562
+ "loss": 1.5019,
563
+ "step": 1820
564
+ },
565
+ {
566
+ "epoch": 1.31,
567
+ "learning_rate": 4.798618140469143e-05,
568
+ "loss": 1.4801,
569
+ "step": 1840
570
+ },
571
+ {
572
+ "epoch": 1.32,
573
+ "learning_rate": 4.794190791650903e-05,
574
+ "loss": 1.4906,
575
+ "step": 1860
576
+ },
577
+ {
578
+ "epoch": 1.34,
579
+ "learning_rate": 4.789717391231328e-05,
580
+ "loss": 1.4914,
581
+ "step": 1880
582
+ },
583
+ {
584
+ "epoch": 1.35,
585
+ "learning_rate": 4.7851980290055896e-05,
586
+ "loss": 1.4578,
587
+ "step": 1900
588
+ },
589
+ {
590
+ "epoch": 1.37,
591
+ "learning_rate": 4.7806327956914544e-05,
592
+ "loss": 1.4613,
593
+ "step": 1920
594
+ },
595
+ {
596
+ "epoch": 1.38,
597
+ "learning_rate": 4.7760217829274675e-05,
598
+ "loss": 1.46,
599
+ "step": 1940
600
+ },
601
+ {
602
+ "epoch": 1.39,
603
+ "learning_rate": 4.771365083271112e-05,
604
+ "loss": 1.4609,
605
+ "step": 1960
606
+ },
607
+ {
608
+ "epoch": 1.41,
609
+ "learning_rate": 4.7666627901969454e-05,
610
+ "loss": 1.4684,
611
+ "step": 1980
612
+ },
613
+ {
614
+ "epoch": 1.42,
615
+ "learning_rate": 4.761914998094732e-05,
616
+ "loss": 1.4534,
617
+ "step": 2000
618
+ },
619
+ {
620
+ "epoch": 1.44,
621
+ "learning_rate": 4.7571218022675443e-05,
622
+ "loss": 1.4674,
623
+ "step": 2020
624
+ },
625
+ {
626
+ "epoch": 1.45,
627
+ "learning_rate": 4.7522832989298486e-05,
628
+ "loss": 1.4783,
629
+ "step": 2040
630
+ },
631
+ {
632
+ "epoch": 1.46,
633
+ "learning_rate": 4.747399585205575e-05,
634
+ "loss": 1.4671,
635
+ "step": 2060
636
+ },
637
+ {
638
+ "epoch": 1.48,
639
+ "learning_rate": 4.7424707591261685e-05,
640
+ "loss": 1.4704,
641
+ "step": 2080
642
+ },
643
+ {
644
+ "epoch": 1.49,
645
+ "learning_rate": 4.737496919628619e-05,
646
+ "loss": 1.4554,
647
+ "step": 2100
648
+ },
649
+ {
650
+ "epoch": 1.51,
651
+ "learning_rate": 4.732478166553479e-05,
652
+ "loss": 1.4309,
653
+ "step": 2120
654
+ },
655
+ {
656
+ "epoch": 1.52,
657
+ "learning_rate": 4.727414600642857e-05,
658
+ "loss": 1.4581,
659
+ "step": 2140
660
+ },
661
+ {
662
+ "epoch": 1.54,
663
+ "learning_rate": 4.722306323538392e-05,
664
+ "loss": 1.4518,
665
+ "step": 2160
666
+ },
667
+ {
668
+ "epoch": 1.55,
669
+ "learning_rate": 4.717153437779221e-05,
670
+ "loss": 1.4419,
671
+ "step": 2180
672
+ },
673
+ {
674
+ "epoch": 1.56,
675
+ "learning_rate": 4.711956046799917e-05,
676
+ "loss": 1.4509,
677
+ "step": 2200
678
+ },
679
+ {
680
+ "epoch": 1.58,
681
+ "learning_rate": 4.7067142549284085e-05,
682
+ "loss": 1.4339,
683
+ "step": 2220
684
+ },
685
+ {
686
+ "epoch": 1.59,
687
+ "learning_rate": 4.7014281673838904e-05,
688
+ "loss": 1.4433,
689
+ "step": 2240
690
+ },
691
+ {
692
+ "epoch": 1.61,
693
+ "learning_rate": 4.6960978902747135e-05,
694
+ "loss": 1.4317,
695
+ "step": 2260
696
+ },
697
+ {
698
+ "epoch": 1.62,
699
+ "learning_rate": 4.6907235305962476e-05,
700
+ "loss": 1.4491,
701
+ "step": 2280
702
+ },
703
+ {
704
+ "epoch": 1.64,
705
+ "learning_rate": 4.6853051962287405e-05,
706
+ "loss": 1.4265,
707
+ "step": 2300
708
+ },
709
+ {
710
+ "epoch": 1.65,
711
+ "learning_rate": 4.679842995935149e-05,
712
+ "loss": 1.4216,
713
+ "step": 2320
714
+ },
715
+ {
716
+ "epoch": 1.66,
717
+ "learning_rate": 4.674337039358957e-05,
718
+ "loss": 1.4212,
719
+ "step": 2340
720
+ },
721
+ {
722
+ "epoch": 1.68,
723
+ "learning_rate": 4.668787437021973e-05,
724
+ "loss": 1.428,
725
+ "step": 2360
726
+ },
727
+ {
728
+ "epoch": 1.69,
729
+ "learning_rate": 4.6631943003221145e-05,
730
+ "loss": 1.449,
731
+ "step": 2380
732
+ },
733
+ {
734
+ "epoch": 1.71,
735
+ "learning_rate": 4.6575577415311684e-05,
736
+ "loss": 1.439,
737
+ "step": 2400
738
+ },
739
+ {
740
+ "epoch": 1.72,
741
+ "learning_rate": 4.6518778737925406e-05,
742
+ "loss": 1.4642,
743
+ "step": 2420
744
+ },
745
+ {
746
+ "epoch": 1.73,
747
+ "learning_rate": 4.646154811118982e-05,
748
+ "loss": 1.4386,
749
+ "step": 2440
750
+ },
751
+ {
752
+ "epoch": 1.75,
753
+ "learning_rate": 4.640388668390302e-05,
754
+ "loss": 1.4141,
755
+ "step": 2460
756
+ },
757
+ {
758
+ "epoch": 1.76,
759
+ "learning_rate": 4.6345795613510625e-05,
760
+ "loss": 1.4119,
761
+ "step": 2480
762
+ },
763
+ {
764
+ "epoch": 1.78,
765
+ "learning_rate": 4.6287276066082516e-05,
766
+ "loss": 1.4375,
767
+ "step": 2500
768
+ },
769
+ {
770
+ "epoch": 1.79,
771
+ "learning_rate": 4.6228329216289475e-05,
772
+ "loss": 1.4006,
773
+ "step": 2520
774
+ },
775
+ {
776
+ "epoch": 1.81,
777
+ "learning_rate": 4.616895624737957e-05,
778
+ "loss": 1.4223,
779
+ "step": 2540
780
+ },
781
+ {
782
+ "epoch": 1.82,
783
+ "learning_rate": 4.6109158351154416e-05,
784
+ "loss": 1.427,
785
+ "step": 2560
786
+ },
787
+ {
788
+ "epoch": 1.83,
789
+ "learning_rate": 4.6048936727945255e-05,
790
+ "loss": 1.4096,
791
+ "step": 2580
792
+ },
793
+ {
794
+ "epoch": 1.85,
795
+ "learning_rate": 4.598829258658885e-05,
796
+ "loss": 1.4169,
797
+ "step": 2600
798
+ },
799
+ {
800
+ "epoch": 1.86,
801
+ "learning_rate": 4.592722714440324e-05,
802
+ "loss": 1.3977,
803
+ "step": 2620
804
+ },
805
+ {
806
+ "epoch": 1.88,
807
+ "learning_rate": 4.586574162716328e-05,
808
+ "loss": 1.4158,
809
+ "step": 2640
810
+ },
811
+ {
812
+ "epoch": 1.89,
813
+ "learning_rate": 4.5803837269076073e-05,
814
+ "loss": 1.384,
815
+ "step": 2660
816
+ },
817
+ {
818
+ "epoch": 1.91,
819
+ "learning_rate": 4.5741515312756125e-05,
820
+ "loss": 1.4171,
821
+ "step": 2680
822
+ },
823
+ {
824
+ "epoch": 1.92,
825
+ "learning_rate": 4.567877700920049e-05,
826
+ "loss": 1.4323,
827
+ "step": 2700
828
+ },
829
+ {
830
+ "epoch": 1.93,
831
+ "learning_rate": 4.5615623617763606e-05,
832
+ "loss": 1.4126,
833
+ "step": 2720
834
+ },
835
+ {
836
+ "epoch": 1.95,
837
+ "learning_rate": 4.5552056406132003e-05,
838
+ "loss": 1.4159,
839
+ "step": 2740
840
+ },
841
+ {
842
+ "epoch": 1.96,
843
+ "learning_rate": 4.548807665029892e-05,
844
+ "loss": 1.4213,
845
+ "step": 2760
846
+ },
847
+ {
848
+ "epoch": 1.98,
849
+ "learning_rate": 4.542368563453861e-05,
850
+ "loss": 1.4013,
851
+ "step": 2780
852
+ },
853
+ {
854
+ "epoch": 1.99,
855
+ "learning_rate": 4.535888465138063e-05,
856
+ "loss": 1.4365,
857
+ "step": 2800
858
+ },
859
+ {
860
+ "epoch": 2.0,
861
+ "eval_loss": 1.322860598564148,
862
+ "eval_runtime": 293.5153,
863
+ "eval_samples_per_second": 18.953,
864
+ "eval_steps_per_second": 18.953,
865
+ "step": 2812
866
+ },
867
+ {
868
+ "epoch": 2.01,
869
+ "learning_rate": 4.529367500158386e-05,
870
+ "loss": 1.4108,
871
+ "step": 2820
872
+ },
873
+ {
874
+ "epoch": 2.02,
875
+ "learning_rate": 4.522805799411039e-05,
876
+ "loss": 1.3878,
877
+ "step": 2840
878
+ },
879
+ {
880
+ "epoch": 2.03,
881
+ "learning_rate": 4.5162034946099277e-05,
882
+ "loss": 1.3769,
883
+ "step": 2860
884
+ },
885
+ {
886
+ "epoch": 2.05,
887
+ "learning_rate": 4.509560718284007e-05,
888
+ "loss": 1.3847,
889
+ "step": 2880
890
+ },
891
+ {
892
+ "epoch": 2.06,
893
+ "learning_rate": 4.502877603774622e-05,
894
+ "loss": 1.3739,
895
+ "step": 2900
896
+ },
897
+ {
898
+ "epoch": 2.08,
899
+ "learning_rate": 4.496154285232833e-05,
900
+ "loss": 1.3955,
901
+ "step": 2920
902
+ },
903
+ {
904
+ "epoch": 2.09,
905
+ "learning_rate": 4.489390897616719e-05,
906
+ "loss": 1.38,
907
+ "step": 2940
908
+ },
909
+ {
910
+ "epoch": 2.1,
911
+ "learning_rate": 4.482587576688673e-05,
912
+ "loss": 1.4026,
913
+ "step": 2960
914
+ },
915
+ {
916
+ "epoch": 2.12,
917
+ "learning_rate": 4.4757444590126736e-05,
918
+ "loss": 1.4067,
919
+ "step": 2980
920
+ },
921
+ {
922
+ "epoch": 2.13,
923
+ "learning_rate": 4.4688616819515464e-05,
924
+ "loss": 1.39,
925
+ "step": 3000
926
+ },
927
+ {
928
+ "epoch": 2.15,
929
+ "learning_rate": 4.461939383664202e-05,
930
+ "loss": 1.3842,
931
+ "step": 3020
932
+ },
933
+ {
934
+ "epoch": 2.16,
935
+ "learning_rate": 4.45497770310287e-05,
936
+ "loss": 1.3987,
937
+ "step": 3040
938
+ },
939
+ {
940
+ "epoch": 2.18,
941
+ "learning_rate": 4.4479767800103036e-05,
942
+ "loss": 1.4043,
943
+ "step": 3060
944
+ },
945
+ {
946
+ "epoch": 2.19,
947
+ "learning_rate": 4.4409367549169764e-05,
948
+ "loss": 1.4022,
949
+ "step": 3080
950
+ },
951
+ {
952
+ "epoch": 2.2,
953
+ "learning_rate": 4.433857769138261e-05,
954
+ "loss": 1.3969,
955
+ "step": 3100
956
+ },
957
+ {
958
+ "epoch": 2.22,
959
+ "learning_rate": 4.426739964771595e-05,
960
+ "loss": 1.3893,
961
+ "step": 3120
962
+ },
963
+ {
964
+ "epoch": 2.23,
965
+ "learning_rate": 4.4195834846936264e-05,
966
+ "loss": 1.3609,
967
+ "step": 3140
968
+ },
969
+ {
970
+ "epoch": 2.25,
971
+ "learning_rate": 4.4123884725573446e-05,
972
+ "loss": 1.3734,
973
+ "step": 3160
974
+ },
975
+ {
976
+ "epoch": 2.26,
977
+ "learning_rate": 4.4051550727892e-05,
978
+ "loss": 1.3975,
979
+ "step": 3180
980
+ },
981
+ {
982
+ "epoch": 2.28,
983
+ "learning_rate": 4.3978834305862004e-05,
984
+ "loss": 1.4096,
985
+ "step": 3200
986
+ },
987
+ {
988
+ "epoch": 2.29,
989
+ "learning_rate": 4.3905736919130034e-05,
990
+ "loss": 1.3754,
991
+ "step": 3220
992
+ },
993
+ {
994
+ "epoch": 2.3,
995
+ "learning_rate": 4.383226003498978e-05,
996
+ "loss": 1.3734,
997
+ "step": 3240
998
+ },
999
+ {
1000
+ "epoch": 2.32,
1001
+ "learning_rate": 4.375840512835266e-05,
1002
+ "loss": 1.3869,
1003
+ "step": 3260
1004
+ },
1005
+ {
1006
+ "epoch": 2.33,
1007
+ "learning_rate": 4.368417368171819e-05,
1008
+ "loss": 1.3934,
1009
+ "step": 3280
1010
+ },
1011
+ {
1012
+ "epoch": 2.35,
1013
+ "learning_rate": 4.3609567185144184e-05,
1014
+ "loss": 1.3855,
1015
+ "step": 3300
1016
+ },
1017
+ {
1018
+ "epoch": 2.36,
1019
+ "learning_rate": 4.3534587136216944e-05,
1020
+ "loss": 1.369,
1021
+ "step": 3320
1022
+ },
1023
+ {
1024
+ "epoch": 2.37,
1025
+ "learning_rate": 4.345923504002111e-05,
1026
+ "loss": 1.3785,
1027
+ "step": 3340
1028
+ },
1029
+ {
1030
+ "epoch": 2.39,
1031
+ "learning_rate": 4.338351240910945e-05,
1032
+ "loss": 1.393,
1033
+ "step": 3360
1034
+ },
1035
+ {
1036
+ "epoch": 2.4,
1037
+ "learning_rate": 4.330742076347258e-05,
1038
+ "loss": 1.3925,
1039
+ "step": 3380
1040
+ },
1041
+ {
1042
+ "epoch": 2.42,
1043
+ "learning_rate": 4.3230961630508354e-05,
1044
+ "loss": 1.3671,
1045
+ "step": 3400
1046
+ },
1047
+ {
1048
+ "epoch": 2.43,
1049
+ "learning_rate": 4.315413654499128e-05,
1050
+ "loss": 1.3622,
1051
+ "step": 3420
1052
+ },
1053
+ {
1054
+ "epoch": 2.45,
1055
+ "learning_rate": 4.307694704904165e-05,
1056
+ "loss": 1.4037,
1057
+ "step": 3440
1058
+ },
1059
+ {
1060
+ "epoch": 2.46,
1061
+ "learning_rate": 4.299939469209463e-05,
1062
+ "loss": 1.3907,
1063
+ "step": 3460
1064
+ },
1065
+ {
1066
+ "epoch": 2.47,
1067
+ "learning_rate": 4.292148103086917e-05,
1068
+ "loss": 1.3804,
1069
+ "step": 3480
1070
+ },
1071
+ {
1072
+ "epoch": 2.49,
1073
+ "learning_rate": 4.2843207629336694e-05,
1074
+ "loss": 1.3576,
1075
+ "step": 3500
1076
+ },
1077
+ {
1078
+ "epoch": 2.5,
1079
+ "learning_rate": 4.2764576058689735e-05,
1080
+ "loss": 1.3648,
1081
+ "step": 3520
1082
+ },
1083
+ {
1084
+ "epoch": 2.52,
1085
+ "learning_rate": 4.268558789731044e-05,
1086
+ "loss": 1.3788,
1087
+ "step": 3540
1088
+ },
1089
+ {
1090
+ "epoch": 2.53,
1091
+ "learning_rate": 4.260624473073883e-05,
1092
+ "loss": 1.3793,
1093
+ "step": 3560
1094
+ },
1095
+ {
1096
+ "epoch": 2.55,
1097
+ "learning_rate": 4.2526548151640986e-05,
1098
+ "loss": 1.369,
1099
+ "step": 3580
1100
+ },
1101
+ {
1102
+ "epoch": 2.56,
1103
+ "learning_rate": 4.24464997597771e-05,
1104
+ "loss": 1.3906,
1105
+ "step": 3600
1106
+ },
1107
+ {
1108
+ "epoch": 2.57,
1109
+ "learning_rate": 4.236610116196934e-05,
1110
+ "loss": 1.372,
1111
+ "step": 3620
1112
+ },
1113
+ {
1114
+ "epoch": 2.59,
1115
+ "learning_rate": 4.228535397206962e-05,
1116
+ "loss": 1.3862,
1117
+ "step": 3640
1118
+ },
1119
+ {
1120
+ "epoch": 2.6,
1121
+ "learning_rate": 4.220425981092716e-05,
1122
+ "loss": 1.3766,
1123
+ "step": 3660
1124
+ },
1125
+ {
1126
+ "epoch": 2.62,
1127
+ "learning_rate": 4.212282030635601e-05,
1128
+ "loss": 1.3562,
1129
+ "step": 3680
1130
+ },
1131
+ {
1132
+ "epoch": 2.63,
1133
+ "learning_rate": 4.204103709310234e-05,
1134
+ "loss": 1.3607,
1135
+ "step": 3700
1136
+ },
1137
+ {
1138
+ "epoch": 2.64,
1139
+ "learning_rate": 4.195891181281161e-05,
1140
+ "loss": 1.3606,
1141
+ "step": 3720
1142
+ },
1143
+ {
1144
+ "epoch": 2.66,
1145
+ "learning_rate": 4.187644611399566e-05,
1146
+ "loss": 1.3515,
1147
+ "step": 3740
1148
+ },
1149
+ {
1150
+ "epoch": 2.67,
1151
+ "learning_rate": 4.17936416519996e-05,
1152
+ "loss": 1.3694,
1153
+ "step": 3760
1154
+ },
1155
+ {
1156
+ "epoch": 2.69,
1157
+ "learning_rate": 4.171050008896855e-05,
1158
+ "loss": 1.3653,
1159
+ "step": 3780
1160
+ },
1161
+ {
1162
+ "epoch": 2.7,
1163
+ "learning_rate": 4.162702309381434e-05,
1164
+ "loss": 1.3715,
1165
+ "step": 3800
1166
+ },
1167
+ {
1168
+ "epoch": 2.72,
1169
+ "learning_rate": 4.1543212342181956e-05,
1170
+ "loss": 1.3815,
1171
+ "step": 3820
1172
+ },
1173
+ {
1174
+ "epoch": 2.73,
1175
+ "learning_rate": 4.1459069516415916e-05,
1176
+ "loss": 1.3878,
1177
+ "step": 3840
1178
+ },
1179
+ {
1180
+ "epoch": 2.74,
1181
+ "learning_rate": 4.137459630552652e-05,
1182
+ "loss": 1.3602,
1183
+ "step": 3860
1184
+ },
1185
+ {
1186
+ "epoch": 2.76,
1187
+ "learning_rate": 4.128979440515594e-05,
1188
+ "loss": 1.3957,
1189
+ "step": 3880
1190
+ },
1191
+ {
1192
+ "epoch": 2.77,
1193
+ "learning_rate": 4.1204665517544144e-05,
1194
+ "loss": 1.378,
1195
+ "step": 3900
1196
+ },
1197
+ {
1198
+ "epoch": 2.79,
1199
+ "learning_rate": 4.1119211351494795e-05,
1200
+ "loss": 1.3614,
1201
+ "step": 3920
1202
+ },
1203
+ {
1204
+ "epoch": 2.8,
1205
+ "learning_rate": 4.103343362234089e-05,
1206
+ "loss": 1.3419,
1207
+ "step": 3940
1208
+ },
1209
+ {
1210
+ "epoch": 2.82,
1211
+ "learning_rate": 4.0947334051910367e-05,
1212
+ "loss": 1.3703,
1213
+ "step": 3960
1214
+ },
1215
+ {
1216
+ "epoch": 2.83,
1217
+ "learning_rate": 4.086091436849153e-05,
1218
+ "loss": 1.3718,
1219
+ "step": 3980
1220
+ },
1221
+ {
1222
+ "epoch": 2.84,
1223
+ "learning_rate": 4.077417630679833e-05,
1224
+ "loss": 1.3641,
1225
+ "step": 4000
1226
+ },
1227
+ {
1228
+ "epoch": 2.86,
1229
+ "learning_rate": 4.068712160793559e-05,
1230
+ "loss": 1.3822,
1231
+ "step": 4020
1232
+ },
1233
+ {
1234
+ "epoch": 2.87,
1235
+ "learning_rate": 4.0599752019364026e-05,
1236
+ "loss": 1.3544,
1237
+ "step": 4040
1238
+ },
1239
+ {
1240
+ "epoch": 2.89,
1241
+ "learning_rate": 4.0512069294865176e-05,
1242
+ "loss": 1.3798,
1243
+ "step": 4060
1244
+ },
1245
+ {
1246
+ "epoch": 2.9,
1247
+ "learning_rate": 4.042407519450619e-05,
1248
+ "loss": 1.3541,
1249
+ "step": 4080
1250
+ },
1251
+ {
1252
+ "epoch": 2.92,
1253
+ "learning_rate": 4.033577148460456e-05,
1254
+ "loss": 1.3515,
1255
+ "step": 4100
1256
+ },
1257
+ {
1258
+ "epoch": 2.93,
1259
+ "learning_rate": 4.024715993769253e-05,
1260
+ "loss": 1.3719,
1261
+ "step": 4120
1262
+ },
1263
+ {
1264
+ "epoch": 2.94,
1265
+ "learning_rate": 4.0158242332481654e-05,
1266
+ "loss": 1.3501,
1267
+ "step": 4140
1268
+ },
1269
+ {
1270
+ "epoch": 2.96,
1271
+ "learning_rate": 4.006902045382701e-05,
1272
+ "loss": 1.3602,
1273
+ "step": 4160
1274
+ },
1275
+ {
1276
+ "epoch": 2.97,
1277
+ "learning_rate": 3.997949609269143e-05,
1278
+ "loss": 1.3868,
1279
+ "step": 4180
1280
+ },
1281
+ {
1282
+ "epoch": 2.99,
1283
+ "learning_rate": 3.9889671046109464e-05,
1284
+ "loss": 1.3339,
1285
+ "step": 4200
1286
+ },
1287
+ {
1288
+ "epoch": 3.0,
1289
+ "eval_loss": 1.2688919305801392,
1290
+ "eval_runtime": 293.871,
1291
+ "eval_samples_per_second": 18.93,
1292
+ "eval_steps_per_second": 18.93,
1293
+ "step": 4219
1294
+ },
1295
+ {
1296
+ "epoch": 3.0,
1297
+ "learning_rate": 3.979954711715141e-05,
1298
+ "loss": 1.3718,
1299
+ "step": 4220
1300
+ },
1301
+ {
1302
+ "epoch": 3.01,
1303
+ "learning_rate": 3.9709126114887056e-05,
1304
+ "loss": 1.3372,
1305
+ "step": 4240
1306
+ },
1307
+ {
1308
+ "epoch": 3.03,
1309
+ "learning_rate": 3.961840985434938e-05,
1310
+ "loss": 1.3405,
1311
+ "step": 4260
1312
+ },
1313
+ {
1314
+ "epoch": 3.04,
1315
+ "learning_rate": 3.952740015649812e-05,
1316
+ "loss": 1.3063,
1317
+ "step": 4280
1318
+ },
1319
+ {
1320
+ "epoch": 3.06,
1321
+ "learning_rate": 3.9436098848183226e-05,
1322
+ "loss": 1.3516,
1323
+ "step": 4300
1324
+ },
1325
+ {
1326
+ "epoch": 3.07,
1327
+ "learning_rate": 3.9344507762108165e-05,
1328
+ "loss": 1.3381,
1329
+ "step": 4320
1330
+ },
1331
+ {
1332
+ "epoch": 3.09,
1333
+ "learning_rate": 3.925262873679319e-05,
1334
+ "loss": 1.3372,
1335
+ "step": 4340
1336
+ },
1337
+ {
1338
+ "epoch": 3.1,
1339
+ "learning_rate": 3.916046361653836e-05,
1340
+ "loss": 1.3479,
1341
+ "step": 4360
1342
+ },
1343
+ {
1344
+ "epoch": 3.11,
1345
+ "learning_rate": 3.906801425138656e-05,
1346
+ "loss": 1.3488,
1347
+ "step": 4380
1348
+ },
1349
+ {
1350
+ "epoch": 3.13,
1351
+ "learning_rate": 3.89752824970864e-05,
1352
+ "loss": 1.3341,
1353
+ "step": 4400
1354
+ },
1355
+ {
1356
+ "epoch": 3.14,
1357
+ "learning_rate": 3.888227021505486e-05,
1358
+ "loss": 1.3329,
1359
+ "step": 4420
1360
+ },
1361
+ {
1362
+ "epoch": 3.16,
1363
+ "learning_rate": 3.8788979272340066e-05,
1364
+ "loss": 1.3277,
1365
+ "step": 4440
1366
+ },
1367
+ {
1368
+ "epoch": 3.17,
1369
+ "learning_rate": 3.869541154158368e-05,
1370
+ "loss": 1.3255,
1371
+ "step": 4460
1372
+ },
1373
+ {
1374
+ "epoch": 3.19,
1375
+ "learning_rate": 3.860156890098339e-05,
1376
+ "loss": 1.3301,
1377
+ "step": 4480
1378
+ },
1379
+ {
1380
+ "epoch": 3.2,
1381
+ "learning_rate": 3.8507453234255176e-05,
1382
+ "loss": 1.3635,
1383
+ "step": 4500
1384
+ },
1385
+ {
1386
+ "epoch": 3.21,
1387
+ "learning_rate": 3.841306643059552e-05,
1388
+ "loss": 1.352,
1389
+ "step": 4520
1390
+ },
1391
+ {
1392
+ "epoch": 3.23,
1393
+ "learning_rate": 3.8318410384643485e-05,
1394
+ "loss": 1.3335,
1395
+ "step": 4540
1396
+ },
1397
+ {
1398
+ "epoch": 3.24,
1399
+ "learning_rate": 3.822348699644264e-05,
1400
+ "loss": 1.347,
1401
+ "step": 4560
1402
+ },
1403
+ {
1404
+ "epoch": 3.26,
1405
+ "learning_rate": 3.812829817140295e-05,
1406
+ "loss": 1.3573,
1407
+ "step": 4580
1408
+ },
1409
+ {
1410
+ "epoch": 3.27,
1411
+ "learning_rate": 3.8032845820262575e-05,
1412
+ "loss": 1.3265,
1413
+ "step": 4600
1414
+ },
1415
+ {
1416
+ "epoch": 3.28,
1417
+ "learning_rate": 3.793713185904942e-05,
1418
+ "loss": 1.3245,
1419
+ "step": 4620
1420
+ },
1421
+ {
1422
+ "epoch": 3.3,
1423
+ "learning_rate": 3.7841158209042756e-05,
1424
+ "loss": 1.3562,
1425
+ "step": 4640
1426
+ },
1427
+ {
1428
+ "epoch": 3.31,
1429
+ "learning_rate": 3.7744926796734596e-05,
1430
+ "loss": 1.3456,
1431
+ "step": 4660
1432
+ },
1433
+ {
1434
+ "epoch": 3.33,
1435
+ "learning_rate": 3.764843955379107e-05,
1436
+ "loss": 1.3481,
1437
+ "step": 4680
1438
+ },
1439
+ {
1440
+ "epoch": 3.34,
1441
+ "learning_rate": 3.7551698417013635e-05,
1442
+ "loss": 1.3365,
1443
+ "step": 4700
1444
+ },
1445
+ {
1446
+ "epoch": 3.36,
1447
+ "learning_rate": 3.7454705328300164e-05,
1448
+ "loss": 1.3182,
1449
+ "step": 4720
1450
+ },
1451
+ {
1452
+ "epoch": 3.37,
1453
+ "learning_rate": 3.735746223460604e-05,
1454
+ "loss": 1.3323,
1455
+ "step": 4740
1456
+ },
1457
+ {
1458
+ "epoch": 3.38,
1459
+ "learning_rate": 3.7259971087904984e-05,
1460
+ "loss": 1.3532,
1461
+ "step": 4760
1462
+ },
1463
+ {
1464
+ "epoch": 3.4,
1465
+ "learning_rate": 3.7162233845149944e-05,
1466
+ "loss": 1.3528,
1467
+ "step": 4780
1468
+ },
1469
+ {
1470
+ "epoch": 3.41,
1471
+ "learning_rate": 3.706425246823378e-05,
1472
+ "loss": 1.3237,
1473
+ "step": 4800
1474
+ },
1475
+ {
1476
+ "epoch": 3.43,
1477
+ "learning_rate": 3.69660289239499e-05,
1478
+ "loss": 1.3385,
1479
+ "step": 4820
1480
+ },
1481
+ {
1482
+ "epoch": 3.44,
1483
+ "learning_rate": 3.6867565183952764e-05,
1484
+ "loss": 1.3365,
1485
+ "step": 4840
1486
+ },
1487
+ {
1488
+ "epoch": 3.46,
1489
+ "learning_rate": 3.67688632247183e-05,
1490
+ "loss": 1.3413,
1491
+ "step": 4860
1492
+ },
1493
+ {
1494
+ "epoch": 3.47,
1495
+ "learning_rate": 3.666992502750426e-05,
1496
+ "loss": 1.3248,
1497
+ "step": 4880
1498
+ },
1499
+ {
1500
+ "epoch": 3.48,
1501
+ "learning_rate": 3.657075257831043e-05,
1502
+ "loss": 1.3233,
1503
+ "step": 4900
1504
+ },
1505
+ {
1506
+ "epoch": 3.5,
1507
+ "learning_rate": 3.6471347867838766e-05,
1508
+ "loss": 1.3589,
1509
+ "step": 4920
1510
+ },
1511
+ {
1512
+ "epoch": 3.51,
1513
+ "learning_rate": 3.6371712891453424e-05,
1514
+ "loss": 1.3205,
1515
+ "step": 4940
1516
+ },
1517
+ {
1518
+ "epoch": 3.53,
1519
+ "learning_rate": 3.627184964914074e-05,
1520
+ "loss": 1.348,
1521
+ "step": 4960
1522
+ },
1523
+ {
1524
+ "epoch": 3.54,
1525
+ "learning_rate": 3.617176014546906e-05,
1526
+ "loss": 1.3263,
1527
+ "step": 4980
1528
+ },
1529
+ {
1530
+ "epoch": 3.56,
1531
+ "learning_rate": 3.607144638954847e-05,
1532
+ "loss": 1.3229,
1533
+ "step": 5000
1534
+ },
1535
+ {
1536
+ "epoch": 3.57,
1537
+ "learning_rate": 3.597091039499055e-05,
1538
+ "loss": 1.3347,
1539
+ "step": 5020
1540
+ },
1541
+ {
1542
+ "epoch": 3.58,
1543
+ "learning_rate": 3.587015417986788e-05,
1544
+ "loss": 1.3557,
1545
+ "step": 5040
1546
+ },
1547
+ {
1548
+ "epoch": 3.6,
1549
+ "learning_rate": 3.576917976667357e-05,
1550
+ "loss": 1.3575,
1551
+ "step": 5060
1552
+ },
1553
+ {
1554
+ "epoch": 3.61,
1555
+ "learning_rate": 3.566798918228062e-05,
1556
+ "loss": 1.3521,
1557
+ "step": 5080
1558
+ },
1559
+ {
1560
+ "epoch": 3.63,
1561
+ "learning_rate": 3.5566584457901304e-05,
1562
+ "loss": 1.3316,
1563
+ "step": 5100
1564
+ },
1565
+ {
1566
+ "epoch": 3.64,
1567
+ "learning_rate": 3.546496762904633e-05,
1568
+ "loss": 1.3566,
1569
+ "step": 5120
1570
+ },
1571
+ {
1572
+ "epoch": 3.65,
1573
+ "learning_rate": 3.536314073548402e-05,
1574
+ "loss": 1.3319,
1575
+ "step": 5140
1576
+ },
1577
+ {
1578
+ "epoch": 3.67,
1579
+ "learning_rate": 3.5261105821199344e-05,
1580
+ "loss": 1.3249,
1581
+ "step": 5160
1582
+ },
1583
+ {
1584
+ "epoch": 3.68,
1585
+ "learning_rate": 3.515886493435291e-05,
1586
+ "loss": 1.3314,
1587
+ "step": 5180
1588
+ },
1589
+ {
1590
+ "epoch": 3.7,
1591
+ "learning_rate": 3.505642012723983e-05,
1592
+ "loss": 1.332,
1593
+ "step": 5200
1594
+ },
1595
+ {
1596
+ "epoch": 3.71,
1597
+ "learning_rate": 3.495377345624854e-05,
1598
+ "loss": 1.3011,
1599
+ "step": 5220
1600
+ },
1601
+ {
1602
+ "epoch": 3.73,
1603
+ "learning_rate": 3.4850926981819525e-05,
1604
+ "loss": 1.3297,
1605
+ "step": 5240
1606
+ },
1607
+ {
1608
+ "epoch": 3.74,
1609
+ "learning_rate": 3.4747882768403947e-05,
1610
+ "loss": 1.3344,
1611
+ "step": 5260
1612
+ },
1613
+ {
1614
+ "epoch": 3.75,
1615
+ "learning_rate": 3.464464288442219e-05,
1616
+ "loss": 1.3394,
1617
+ "step": 5280
1618
+ },
1619
+ {
1620
+ "epoch": 3.77,
1621
+ "learning_rate": 3.4541209402222396e-05,
1622
+ "loss": 1.3508,
1623
+ "step": 5300
1624
+ },
1625
+ {
1626
+ "epoch": 3.78,
1627
+ "learning_rate": 3.443758439803879e-05,
1628
+ "loss": 1.3295,
1629
+ "step": 5320
1630
+ },
1631
+ {
1632
+ "epoch": 3.8,
1633
+ "learning_rate": 3.433376995195008e-05,
1634
+ "loss": 1.3462,
1635
+ "step": 5340
1636
+ },
1637
+ {
1638
+ "epoch": 3.81,
1639
+ "learning_rate": 3.422976814783765e-05,
1640
+ "loss": 1.3473,
1641
+ "step": 5360
1642
+ },
1643
+ {
1644
+ "epoch": 3.83,
1645
+ "learning_rate": 3.4125581073343735e-05,
1646
+ "loss": 1.3127,
1647
+ "step": 5380
1648
+ },
1649
+ {
1650
+ "epoch": 3.84,
1651
+ "learning_rate": 3.4021210819829555e-05,
1652
+ "loss": 1.3589,
1653
+ "step": 5400
1654
+ },
1655
+ {
1656
+ "epoch": 3.85,
1657
+ "learning_rate": 3.391665948233328e-05,
1658
+ "loss": 1.3337,
1659
+ "step": 5420
1660
+ },
1661
+ {
1662
+ "epoch": 3.87,
1663
+ "learning_rate": 3.3811929159528024e-05,
1664
+ "loss": 1.3286,
1665
+ "step": 5440
1666
+ },
1667
+ {
1668
+ "epoch": 3.88,
1669
+ "learning_rate": 3.370702195367967e-05,
1670
+ "loss": 1.3482,
1671
+ "step": 5460
1672
+ },
1673
+ {
1674
+ "epoch": 3.9,
1675
+ "learning_rate": 3.360193997060475e-05,
1676
+ "loss": 1.3791,
1677
+ "step": 5480
1678
+ },
1679
+ {
1680
+ "epoch": 3.91,
1681
+ "learning_rate": 3.349668531962807e-05,
1682
+ "loss": 1.3573,
1683
+ "step": 5500
1684
+ },
1685
+ {
1686
+ "epoch": 3.92,
1687
+ "learning_rate": 3.339126011354044e-05,
1688
+ "loss": 1.3294,
1689
+ "step": 5520
1690
+ },
1691
+ {
1692
+ "epoch": 3.94,
1693
+ "learning_rate": 3.328566646855625e-05,
1694
+ "loss": 1.3581,
1695
+ "step": 5540
1696
+ },
1697
+ {
1698
+ "epoch": 3.95,
1699
+ "learning_rate": 3.3179906504270996e-05,
1700
+ "loss": 1.3494,
1701
+ "step": 5560
1702
+ },
1703
+ {
1704
+ "epoch": 3.97,
1705
+ "learning_rate": 3.30739823436187e-05,
1706
+ "loss": 1.3135,
1707
+ "step": 5580
1708
+ },
1709
+ {
1710
+ "epoch": 3.98,
1711
+ "learning_rate": 3.2967896112829324e-05,
1712
+ "loss": 1.3276,
1713
+ "step": 5600
1714
+ },
1715
+ {
1716
+ "epoch": 4.0,
1717
+ "learning_rate": 3.286164994138612e-05,
1718
+ "loss": 1.2934,
1719
+ "step": 5620
1720
+ },
1721
+ {
1722
+ "epoch": 4.0,
1723
+ "eval_loss": 1.2395237684249878,
1724
+ "eval_runtime": 294.4154,
1725
+ "eval_samples_per_second": 18.895,
1726
+ "eval_steps_per_second": 18.895,
1727
+ "step": 5625
1728
+ },
1729
+ {
1730
+ "epoch": 4.01,
1731
+ "learning_rate": 3.27552459619828e-05,
1732
+ "loss": 1.3163,
1733
+ "step": 5640
1734
+ },
1735
+ {
1736
+ "epoch": 4.02,
1737
+ "learning_rate": 3.26486863104808e-05,
1738
+ "loss": 1.3012,
1739
+ "step": 5660
1740
+ },
1741
+ {
1742
+ "epoch": 4.04,
1743
+ "learning_rate": 3.25419731258664e-05,
1744
+ "loss": 1.3086,
1745
+ "step": 5680
1746
+ },
1747
+ {
1748
+ "epoch": 4.05,
1749
+ "learning_rate": 3.2435108550207746e-05,
1750
+ "loss": 1.3261,
1751
+ "step": 5700
1752
+ },
1753
+ {
1754
+ "epoch": 4.07,
1755
+ "learning_rate": 3.232809472861189e-05,
1756
+ "loss": 1.2945,
1757
+ "step": 5720
1758
+ },
1759
+ {
1760
+ "epoch": 4.08,
1761
+ "learning_rate": 3.22209338091817e-05,
1762
+ "loss": 1.3301,
1763
+ "step": 5740
1764
+ },
1765
+ {
1766
+ "epoch": 4.1,
1767
+ "learning_rate": 3.211362794297278e-05,
1768
+ "loss": 1.3291,
1769
+ "step": 5760
1770
+ },
1771
+ {
1772
+ "epoch": 4.11,
1773
+ "learning_rate": 3.200617928395028e-05,
1774
+ "loss": 1.3276,
1775
+ "step": 5780
1776
+ },
1777
+ {
1778
+ "epoch": 4.12,
1779
+ "learning_rate": 3.1898589988945596e-05,
1780
+ "loss": 1.3336,
1781
+ "step": 5800
1782
+ },
1783
+ {
1784
+ "epoch": 4.14,
1785
+ "learning_rate": 3.179086221761319e-05,
1786
+ "loss": 1.3275,
1787
+ "step": 5820
1788
+ },
1789
+ {
1790
+ "epoch": 4.15,
1791
+ "learning_rate": 3.1682998132387146e-05,
1792
+ "loss": 1.3114,
1793
+ "step": 5840
1794
+ },
1795
+ {
1796
+ "epoch": 4.17,
1797
+ "learning_rate": 3.15749998984378e-05,
1798
+ "loss": 1.3157,
1799
+ "step": 5860
1800
+ },
1801
+ {
1802
+ "epoch": 4.18,
1803
+ "learning_rate": 3.146686968362827e-05,
1804
+ "loss": 1.3239,
1805
+ "step": 5880
1806
+ },
1807
+ {
1808
+ "epoch": 4.19,
1809
+ "learning_rate": 3.135860965847096e-05,
1810
+ "loss": 1.3204,
1811
+ "step": 5900
1812
+ },
1813
+ {
1814
+ "epoch": 4.21,
1815
+ "learning_rate": 3.125022199608396e-05,
1816
+ "loss": 1.3188,
1817
+ "step": 5920
1818
+ },
1819
+ {
1820
+ "epoch": 4.22,
1821
+ "learning_rate": 3.114170887214744e-05,
1822
+ "loss": 1.3278,
1823
+ "step": 5940
1824
+ },
1825
+ {
1826
+ "epoch": 4.24,
1827
+ "learning_rate": 3.103307246485997e-05,
1828
+ "loss": 1.3028,
1829
+ "step": 5960
1830
+ },
1831
+ {
1832
+ "epoch": 4.25,
1833
+ "learning_rate": 3.092431495489484e-05,
1834
+ "loss": 1.3337,
1835
+ "step": 5980
1836
+ },
1837
+ {
1838
+ "epoch": 4.27,
1839
+ "learning_rate": 3.0815438525356194e-05,
1840
+ "loss": 1.3049,
1841
+ "step": 6000
1842
+ },
1843
+ {
1844
+ "epoch": 4.28,
1845
+ "learning_rate": 3.070644536173531e-05,
1846
+ "loss": 1.2932,
1847
+ "step": 6020
1848
+ },
1849
+ {
1850
+ "epoch": 4.29,
1851
+ "learning_rate": 3.059733765186666e-05,
1852
+ "loss": 1.3111,
1853
+ "step": 6040
1854
+ },
1855
+ {
1856
+ "epoch": 4.31,
1857
+ "learning_rate": 3.0488117585884037e-05,
1858
+ "loss": 1.3193,
1859
+ "step": 6060
1860
+ },
1861
+ {
1862
+ "epoch": 4.32,
1863
+ "learning_rate": 3.0378787356176557e-05,
1864
+ "loss": 1.3272,
1865
+ "step": 6080
1866
+ },
1867
+ {
1868
+ "epoch": 4.34,
1869
+ "learning_rate": 3.0269349157344667e-05,
1870
+ "loss": 1.3487,
1871
+ "step": 6100
1872
+ },
1873
+ {
1874
+ "epoch": 4.35,
1875
+ "learning_rate": 3.015980518615611e-05,
1876
+ "loss": 1.3073,
1877
+ "step": 6120
1878
+ },
1879
+ {
1880
+ "epoch": 4.37,
1881
+ "learning_rate": 3.0050157641501803e-05,
1882
+ "loss": 1.3302,
1883
+ "step": 6140
1884
+ },
1885
+ {
1886
+ "epoch": 4.38,
1887
+ "learning_rate": 2.9940408724351694e-05,
1888
+ "loss": 1.3362,
1889
+ "step": 6160
1890
+ },
1891
+ {
1892
+ "epoch": 4.39,
1893
+ "learning_rate": 2.9830560637710614e-05,
1894
+ "loss": 1.3129,
1895
+ "step": 6180
1896
+ },
1897
+ {
1898
+ "epoch": 4.41,
1899
+ "learning_rate": 2.972061558657403e-05,
1900
+ "loss": 1.3349,
1901
+ "step": 6200
1902
+ },
1903
+ {
1904
+ "epoch": 4.42,
1905
+ "learning_rate": 2.9610575777883785e-05,
1906
+ "loss": 1.3177,
1907
+ "step": 6220
1908
+ },
1909
+ {
1910
+ "epoch": 4.44,
1911
+ "learning_rate": 2.9500443420483815e-05,
1912
+ "loss": 1.2792,
1913
+ "step": 6240
1914
+ },
1915
+ {
1916
+ "epoch": 4.45,
1917
+ "learning_rate": 2.9390220725075778e-05,
1918
+ "loss": 1.3178,
1919
+ "step": 6260
1920
+ },
1921
+ {
1922
+ "epoch": 4.47,
1923
+ "learning_rate": 2.9279909904174717e-05,
1924
+ "loss": 1.3395,
1925
+ "step": 6280
1926
+ },
1927
+ {
1928
+ "epoch": 4.48,
1929
+ "learning_rate": 2.9169513172064634e-05,
1930
+ "loss": 1.3145,
1931
+ "step": 6300
1932
+ },
1933
+ {
1934
+ "epoch": 4.49,
1935
+ "learning_rate": 2.9059032744754022e-05,
1936
+ "loss": 1.3135,
1937
+ "step": 6320
1938
+ },
1939
+ {
1940
+ "epoch": 4.51,
1941
+ "learning_rate": 2.8948470839931403e-05,
1942
+ "loss": 1.3209,
1943
+ "step": 6340
1944
+ },
1945
+ {
1946
+ "epoch": 4.52,
1947
+ "learning_rate": 2.883782967692082e-05,
1948
+ "loss": 1.3101,
1949
+ "step": 6360
1950
+ },
1951
+ {
1952
+ "epoch": 4.54,
1953
+ "learning_rate": 2.872711147663726e-05,
1954
+ "loss": 1.2866,
1955
+ "step": 6380
1956
+ },
1957
+ {
1958
+ "epoch": 4.55,
1959
+ "learning_rate": 2.8616318461542102e-05,
1960
+ "loss": 1.3182,
1961
+ "step": 6400
1962
+ },
1963
+ {
1964
+ "epoch": 4.56,
1965
+ "learning_rate": 2.8505452855598492e-05,
1966
+ "loss": 1.3071,
1967
+ "step": 6420
1968
+ },
1969
+ {
1970
+ "epoch": 4.58,
1971
+ "learning_rate": 2.8394516884226683e-05,
1972
+ "loss": 1.327,
1973
+ "step": 6440
1974
+ },
1975
+ {
1976
+ "epoch": 4.59,
1977
+ "learning_rate": 2.8283512774259414e-05,
1978
+ "loss": 1.3216,
1979
+ "step": 6460
1980
+ },
1981
+ {
1982
+ "epoch": 4.61,
1983
+ "learning_rate": 2.817244275389716e-05,
1984
+ "loss": 1.3102,
1985
+ "step": 6480
1986
+ },
1987
+ {
1988
+ "epoch": 4.62,
1989
+ "learning_rate": 2.806130905266342e-05,
1990
+ "loss": 1.3132,
1991
+ "step": 6500
1992
+ },
1993
+ {
1994
+ "epoch": 4.64,
1995
+ "learning_rate": 2.7950113901359974e-05,
1996
+ "loss": 1.3149,
1997
+ "step": 6520
1998
+ },
1999
+ {
2000
+ "epoch": 4.65,
2001
+ "learning_rate": 2.7838859532022116e-05,
2002
+ "loss": 1.3245,
2003
+ "step": 6540
2004
+ },
2005
+ {
2006
+ "epoch": 4.66,
2007
+ "learning_rate": 2.7727548177873798e-05,
2008
+ "loss": 1.3162,
2009
+ "step": 6560
2010
+ },
2011
+ {
2012
+ "epoch": 4.68,
2013
+ "learning_rate": 2.7616182073282854e-05,
2014
+ "loss": 1.3013,
2015
+ "step": 6580
2016
+ },
2017
+ {
2018
+ "epoch": 4.69,
2019
+ "learning_rate": 2.7504763453716132e-05,
2020
+ "loss": 1.2989,
2021
+ "step": 6600
2022
+ },
2023
+ {
2024
+ "epoch": 4.71,
2025
+ "learning_rate": 2.7393294555694614e-05,
2026
+ "loss": 1.3003,
2027
+ "step": 6620
2028
+ },
2029
+ {
2030
+ "epoch": 4.72,
2031
+ "learning_rate": 2.728177761674854e-05,
2032
+ "loss": 1.3119,
2033
+ "step": 6640
2034
+ },
2035
+ {
2036
+ "epoch": 4.74,
2037
+ "learning_rate": 2.717021487537246e-05,
2038
+ "loss": 1.3289,
2039
+ "step": 6660
2040
+ },
2041
+ {
2042
+ "epoch": 4.75,
2043
+ "learning_rate": 2.7058608570980343e-05,
2044
+ "loss": 1.3347,
2045
+ "step": 6680
2046
+ },
2047
+ {
2048
+ "epoch": 4.76,
2049
+ "learning_rate": 2.6946960943860596e-05,
2050
+ "loss": 1.3238,
2051
+ "step": 6700
2052
+ },
2053
+ {
2054
+ "epoch": 4.78,
2055
+ "learning_rate": 2.6835274235131107e-05,
2056
+ "loss": 1.3368,
2057
+ "step": 6720
2058
+ },
2059
+ {
2060
+ "epoch": 4.79,
2061
+ "learning_rate": 2.6723550686694245e-05,
2062
+ "loss": 1.3092,
2063
+ "step": 6740
2064
+ },
2065
+ {
2066
+ "epoch": 4.81,
2067
+ "learning_rate": 2.661179254119187e-05,
2068
+ "loss": 1.3458,
2069
+ "step": 6760
2070
+ },
2071
+ {
2072
+ "epoch": 4.82,
2073
+ "learning_rate": 2.6500002041960338e-05,
2074
+ "loss": 1.3271,
2075
+ "step": 6780
2076
+ },
2077
+ {
2078
+ "epoch": 4.83,
2079
+ "learning_rate": 2.6388181432985405e-05,
2080
+ "loss": 1.3315,
2081
+ "step": 6800
2082
+ },
2083
+ {
2084
+ "epoch": 4.85,
2085
+ "learning_rate": 2.6276332958857246e-05,
2086
+ "loss": 1.2831,
2087
+ "step": 6820
2088
+ },
2089
+ {
2090
+ "epoch": 4.86,
2091
+ "learning_rate": 2.6164458864725384e-05,
2092
+ "loss": 1.3122,
2093
+ "step": 6840
2094
+ },
2095
+ {
2096
+ "epoch": 4.88,
2097
+ "learning_rate": 2.6052561396253595e-05,
2098
+ "loss": 1.3207,
2099
+ "step": 6860
2100
+ },
2101
+ {
2102
+ "epoch": 4.89,
2103
+ "learning_rate": 2.5940642799574876e-05,
2104
+ "loss": 1.3178,
2105
+ "step": 6880
2106
+ },
2107
+ {
2108
+ "epoch": 4.91,
2109
+ "learning_rate": 2.5828705321246304e-05,
2110
+ "loss": 1.2852,
2111
+ "step": 6900
2112
+ },
2113
+ {
2114
+ "epoch": 4.92,
2115
+ "learning_rate": 2.5716751208204e-05,
2116
+ "loss": 1.292,
2117
+ "step": 6920
2118
+ },
2119
+ {
2120
+ "epoch": 4.93,
2121
+ "learning_rate": 2.560478270771798e-05,
2122
+ "loss": 1.3149,
2123
+ "step": 6940
2124
+ },
2125
+ {
2126
+ "epoch": 4.95,
2127
+ "learning_rate": 2.549280206734705e-05,
2128
+ "loss": 1.2975,
2129
+ "step": 6960
2130
+ },
2131
+ {
2132
+ "epoch": 4.96,
2133
+ "learning_rate": 2.538081153489373e-05,
2134
+ "loss": 1.3232,
2135
+ "step": 6980
2136
+ },
2137
+ {
2138
+ "epoch": 4.98,
2139
+ "learning_rate": 2.5268813358359084e-05,
2140
+ "loss": 1.3329,
2141
+ "step": 7000
2142
+ },
2143
+ {
2144
+ "epoch": 4.99,
2145
+ "learning_rate": 2.5156809785897623e-05,
2146
+ "loss": 1.3202,
2147
+ "step": 7020
2148
+ },
2149
+ {
2150
+ "epoch": 5.0,
2151
+ "eval_loss": 1.2252917289733887,
2152
+ "eval_runtime": 294.3042,
2153
+ "eval_samples_per_second": 18.902,
2154
+ "eval_steps_per_second": 18.902,
2155
+ "step": 7032
2156
+ },
2157
+ {
2158
+ "epoch": 5.01,
2159
+ "learning_rate": 2.5044803065772165e-05,
2160
+ "loss": 1.3016,
2161
+ "step": 7040
2162
+ },
2163
+ {
2164
+ "epoch": 5.02,
2165
+ "learning_rate": 2.4932795446308734e-05,
2166
+ "loss": 1.2747,
2167
+ "step": 7060
2168
+ },
2169
+ {
2170
+ "epoch": 5.03,
2171
+ "learning_rate": 2.482078917585136e-05,
2172
+ "loss": 1.3207,
2173
+ "step": 7080
2174
+ },
2175
+ {
2176
+ "epoch": 5.05,
2177
+ "learning_rate": 2.4708786502717054e-05,
2178
+ "loss": 1.2924,
2179
+ "step": 7100
2180
+ },
2181
+ {
2182
+ "epoch": 5.06,
2183
+ "learning_rate": 2.4596789675150577e-05,
2184
+ "loss": 1.3039,
2185
+ "step": 7120
2186
+ },
2187
+ {
2188
+ "epoch": 5.08,
2189
+ "learning_rate": 2.4484800941279355e-05,
2190
+ "loss": 1.2891,
2191
+ "step": 7140
2192
+ },
2193
+ {
2194
+ "epoch": 5.09,
2195
+ "learning_rate": 2.4372822549068354e-05,
2196
+ "loss": 1.3055,
2197
+ "step": 7160
2198
+ },
2199
+ {
2200
+ "epoch": 5.1,
2201
+ "learning_rate": 2.4260856746274963e-05,
2202
+ "loss": 1.3284,
2203
+ "step": 7180
2204
+ },
2205
+ {
2206
+ "epoch": 5.12,
2207
+ "learning_rate": 2.4148905780403844e-05,
2208
+ "loss": 1.31,
2209
+ "step": 7200
2210
+ },
2211
+ {
2212
+ "epoch": 5.13,
2213
+ "learning_rate": 2.4036971898661832e-05,
2214
+ "loss": 1.2969,
2215
+ "step": 7220
2216
+ },
2217
+ {
2218
+ "epoch": 5.15,
2219
+ "learning_rate": 2.392505734791285e-05,
2220
+ "loss": 1.2862,
2221
+ "step": 7240
2222
+ },
2223
+ {
2224
+ "epoch": 5.16,
2225
+ "learning_rate": 2.3813164374632775e-05,
2226
+ "loss": 1.2984,
2227
+ "step": 7260
2228
+ },
2229
+ {
2230
+ "epoch": 5.18,
2231
+ "learning_rate": 2.3701295224864356e-05,
2232
+ "loss": 1.2816,
2233
+ "step": 7280
2234
+ },
2235
+ {
2236
+ "epoch": 5.19,
2237
+ "learning_rate": 2.3589452144172137e-05,
2238
+ "loss": 1.3104,
2239
+ "step": 7300
2240
+ },
2241
+ {
2242
+ "epoch": 5.2,
2243
+ "learning_rate": 2.347763737759736e-05,
2244
+ "loss": 1.3174,
2245
+ "step": 7320
2246
+ },
2247
+ {
2248
+ "epoch": 5.22,
2249
+ "learning_rate": 2.336585316961292e-05,
2250
+ "loss": 1.2857,
2251
+ "step": 7340
2252
+ },
2253
+ {
2254
+ "epoch": 5.23,
2255
+ "learning_rate": 2.325410176407833e-05,
2256
+ "loss": 1.3064,
2257
+ "step": 7360
2258
+ },
2259
+ {
2260
+ "epoch": 5.25,
2261
+ "learning_rate": 2.314238540419461e-05,
2262
+ "loss": 1.3106,
2263
+ "step": 7380
2264
+ },
2265
+ {
2266
+ "epoch": 5.26,
2267
+ "learning_rate": 2.303070633245933e-05,
2268
+ "loss": 1.286,
2269
+ "step": 7400
2270
+ },
2271
+ {
2272
+ "epoch": 5.28,
2273
+ "learning_rate": 2.2919066790621575e-05,
2274
+ "loss": 1.3003,
2275
+ "step": 7420
2276
+ },
2277
+ {
2278
+ "epoch": 5.29,
2279
+ "learning_rate": 2.280746901963693e-05,
2280
+ "loss": 1.3026,
2281
+ "step": 7440
2282
+ },
2283
+ {
2284
+ "epoch": 5.3,
2285
+ "learning_rate": 2.26959152596225e-05,
2286
+ "loss": 1.3179,
2287
+ "step": 7460
2288
+ },
2289
+ {
2290
+ "epoch": 5.32,
2291
+ "learning_rate": 2.2584407749811985e-05,
2292
+ "loss": 1.3108,
2293
+ "step": 7480
2294
+ },
2295
+ {
2296
+ "epoch": 5.33,
2297
+ "learning_rate": 2.2472948728510664e-05,
2298
+ "loss": 1.2946,
2299
+ "step": 7500
2300
+ },
2301
+ {
2302
+ "epoch": 5.35,
2303
+ "learning_rate": 2.2361540433050492e-05,
2304
+ "loss": 1.2609,
2305
+ "step": 7520
2306
+ },
2307
+ {
2308
+ "epoch": 5.36,
2309
+ "learning_rate": 2.2250185099745253e-05,
2310
+ "loss": 1.3279,
2311
+ "step": 7540
2312
+ },
2313
+ {
2314
+ "epoch": 5.38,
2315
+ "learning_rate": 2.213888496384556e-05,
2316
+ "loss": 1.3078,
2317
+ "step": 7560
2318
+ },
2319
+ {
2320
+ "epoch": 5.39,
2321
+ "learning_rate": 2.2027642259494046e-05,
2322
+ "loss": 1.3185,
2323
+ "step": 7580
2324
+ },
2325
+ {
2326
+ "epoch": 5.4,
2327
+ "learning_rate": 2.1916459219680557e-05,
2328
+ "loss": 1.3063,
2329
+ "step": 7600
2330
+ },
2331
+ {
2332
+ "epoch": 5.42,
2333
+ "learning_rate": 2.1805338076197234e-05,
2334
+ "loss": 1.3001,
2335
+ "step": 7620
2336
+ },
2337
+ {
2338
+ "epoch": 5.43,
2339
+ "learning_rate": 2.169428105959378e-05,
2340
+ "loss": 1.3317,
2341
+ "step": 7640
2342
+ },
2343
+ {
2344
+ "epoch": 5.45,
2345
+ "learning_rate": 2.1583290399132695e-05,
2346
+ "loss": 1.3007,
2347
+ "step": 7660
2348
+ },
2349
+ {
2350
+ "epoch": 5.46,
2351
+ "learning_rate": 2.147236832274447e-05,
2352
+ "loss": 1.3081,
2353
+ "step": 7680
2354
+ },
2355
+ {
2356
+ "epoch": 5.47,
2357
+ "learning_rate": 2.1361517056982903e-05,
2358
+ "loss": 1.2867,
2359
+ "step": 7700
2360
+ },
2361
+ {
2362
+ "epoch": 5.49,
2363
+ "learning_rate": 2.1250738826980432e-05,
2364
+ "loss": 1.3427,
2365
+ "step": 7720
2366
+ },
2367
+ {
2368
+ "epoch": 5.5,
2369
+ "learning_rate": 2.1140035856403405e-05,
2370
+ "loss": 1.2951,
2371
+ "step": 7740
2372
+ },
2373
+ {
2374
+ "epoch": 5.52,
2375
+ "learning_rate": 2.1029410367407476e-05,
2376
+ "loss": 1.3178,
2377
+ "step": 7760
2378
+ },
2379
+ {
2380
+ "epoch": 5.53,
2381
+ "learning_rate": 2.0918864580593034e-05,
2382
+ "loss": 1.3031,
2383
+ "step": 7780
2384
+ },
2385
+ {
2386
+ "epoch": 5.55,
2387
+ "learning_rate": 2.0808400714960567e-05,
2388
+ "loss": 1.2934,
2389
+ "step": 7800
2390
+ },
2391
+ {
2392
+ "epoch": 5.56,
2393
+ "learning_rate": 2.0698020987866153e-05,
2394
+ "loss": 1.3317,
2395
+ "step": 7820
2396
+ },
2397
+ {
2398
+ "epoch": 5.57,
2399
+ "learning_rate": 2.058772761497694e-05,
2400
+ "loss": 1.2851,
2401
+ "step": 7840
2402
+ },
2403
+ {
2404
+ "epoch": 5.59,
2405
+ "learning_rate": 2.047752281022671e-05,
2406
+ "loss": 1.3029,
2407
+ "step": 7860
2408
+ },
2409
+ {
2410
+ "epoch": 5.6,
2411
+ "learning_rate": 2.0367408785771353e-05,
2412
+ "loss": 1.3084,
2413
+ "step": 7880
2414
+ },
2415
+ {
2416
+ "epoch": 5.62,
2417
+ "learning_rate": 2.0257387751944556e-05,
2418
+ "loss": 1.3114,
2419
+ "step": 7900
2420
+ },
2421
+ {
2422
+ "epoch": 5.63,
2423
+ "learning_rate": 2.014746191721337e-05,
2424
+ "loss": 1.3015,
2425
+ "step": 7920
2426
+ },
2427
+ {
2428
+ "epoch": 5.65,
2429
+ "learning_rate": 2.003763348813391e-05,
2430
+ "loss": 1.2976,
2431
+ "step": 7940
2432
+ },
2433
+ {
2434
+ "epoch": 5.66,
2435
+ "learning_rate": 1.992790466930706e-05,
2436
+ "loss": 1.3081,
2437
+ "step": 7960
2438
+ },
2439
+ {
2440
+ "epoch": 5.67,
2441
+ "learning_rate": 1.98182776633342e-05,
2442
+ "loss": 1.3105,
2443
+ "step": 7980
2444
+ },
2445
+ {
2446
+ "epoch": 5.69,
2447
+ "learning_rate": 1.9708754670773005e-05,
2448
+ "loss": 1.3172,
2449
+ "step": 8000
2450
+ },
2451
+ {
2452
+ "epoch": 5.7,
2453
+ "learning_rate": 1.9599337890093302e-05,
2454
+ "loss": 1.3233,
2455
+ "step": 8020
2456
+ },
2457
+ {
2458
+ "epoch": 5.72,
2459
+ "learning_rate": 1.9490029517632884e-05,
2460
+ "loss": 1.3026,
2461
+ "step": 8040
2462
+ },
2463
+ {
2464
+ "epoch": 5.73,
2465
+ "learning_rate": 1.9380831747553458e-05,
2466
+ "loss": 1.2666,
2467
+ "step": 8060
2468
+ },
2469
+ {
2470
+ "epoch": 5.74,
2471
+ "learning_rate": 1.9271746771796607e-05,
2472
+ "loss": 1.3369,
2473
+ "step": 8080
2474
+ },
2475
+ {
2476
+ "epoch": 5.76,
2477
+ "learning_rate": 1.9162776780039766e-05,
2478
+ "loss": 1.307,
2479
+ "step": 8100
2480
+ },
2481
+ {
2482
+ "epoch": 5.77,
2483
+ "learning_rate": 1.905392395965227e-05,
2484
+ "loss": 1.2926,
2485
+ "step": 8120
2486
+ },
2487
+ {
2488
+ "epoch": 5.79,
2489
+ "learning_rate": 1.8945190495651492e-05,
2490
+ "loss": 1.2972,
2491
+ "step": 8140
2492
+ },
2493
+ {
2494
+ "epoch": 5.8,
2495
+ "learning_rate": 1.8836578570658926e-05,
2496
+ "loss": 1.3041,
2497
+ "step": 8160
2498
+ },
2499
+ {
2500
+ "epoch": 5.82,
2501
+ "learning_rate": 1.872809036485637e-05,
2502
+ "loss": 1.3299,
2503
+ "step": 8180
2504
+ },
2505
+ {
2506
+ "epoch": 5.83,
2507
+ "learning_rate": 1.8619728055942254e-05,
2508
+ "loss": 1.2817,
2509
+ "step": 8200
2510
+ },
2511
+ {
2512
+ "epoch": 5.84,
2513
+ "learning_rate": 1.851149381908781e-05,
2514
+ "loss": 1.3078,
2515
+ "step": 8220
2516
+ },
2517
+ {
2518
+ "epoch": 5.86,
2519
+ "learning_rate": 1.8403389826893476e-05,
2520
+ "loss": 1.3133,
2521
+ "step": 8240
2522
+ },
2523
+ {
2524
+ "epoch": 5.87,
2525
+ "learning_rate": 1.8295418249345283e-05,
2526
+ "loss": 1.2672,
2527
+ "step": 8260
2528
+ },
2529
+ {
2530
+ "epoch": 5.89,
2531
+ "learning_rate": 1.8187581253771274e-05,
2532
+ "loss": 1.3216,
2533
+ "step": 8280
2534
+ },
2535
+ {
2536
+ "epoch": 5.9,
2537
+ "learning_rate": 1.8079881004798005e-05,
2538
+ "loss": 1.3328,
2539
+ "step": 8300
2540
+ },
2541
+ {
2542
+ "epoch": 5.92,
2543
+ "learning_rate": 1.797231966430712e-05,
2544
+ "loss": 1.2957,
2545
+ "step": 8320
2546
+ },
2547
+ {
2548
+ "epoch": 5.93,
2549
+ "learning_rate": 1.7864899391391915e-05,
2550
+ "loss": 1.3012,
2551
+ "step": 8340
2552
+ },
2553
+ {
2554
+ "epoch": 5.94,
2555
+ "learning_rate": 1.775762234231401e-05,
2556
+ "loss": 1.3287,
2557
+ "step": 8360
2558
+ },
2559
+ {
2560
+ "epoch": 5.96,
2561
+ "learning_rate": 1.7650490670460113e-05,
2562
+ "loss": 1.3065,
2563
+ "step": 8380
2564
+ },
2565
+ {
2566
+ "epoch": 5.97,
2567
+ "learning_rate": 1.7543506526298713e-05,
2568
+ "loss": 1.3226,
2569
+ "step": 8400
2570
+ },
2571
+ {
2572
+ "epoch": 5.99,
2573
+ "learning_rate": 1.7436672057336967e-05,
2574
+ "loss": 1.3222,
2575
+ "step": 8420
2576
+ },
2577
+ {
2578
+ "epoch": 6.0,
2579
+ "eval_loss": 1.2183001041412354,
2580
+ "eval_runtime": 291.6367,
2581
+ "eval_samples_per_second": 19.075,
2582
+ "eval_steps_per_second": 19.075,
2583
+ "step": 8438
2584
+ },
2585
+ {
2586
+ "epoch": 6.0,
2587
+ "learning_rate": 1.7329989408077596e-05,
2588
+ "loss": 1.3026,
2589
+ "step": 8440
2590
+ },
2591
+ {
2592
+ "epoch": 6.02,
2593
+ "learning_rate": 1.722346071997582e-05,
2594
+ "loss": 1.2731,
2595
+ "step": 8460
2596
+ },
2597
+ {
2598
+ "epoch": 6.03,
2599
+ "learning_rate": 1.7117088131396355e-05,
2600
+ "loss": 1.3217,
2601
+ "step": 8480
2602
+ },
2603
+ {
2604
+ "epoch": 6.04,
2605
+ "learning_rate": 1.701087377757053e-05,
2606
+ "loss": 1.2966,
2607
+ "step": 8500
2608
+ },
2609
+ {
2610
+ "epoch": 6.06,
2611
+ "learning_rate": 1.6904819790553407e-05,
2612
+ "loss": 1.3306,
2613
+ "step": 8520
2614
+ },
2615
+ {
2616
+ "epoch": 6.07,
2617
+ "learning_rate": 1.6798928299180978e-05,
2618
+ "loss": 1.3007,
2619
+ "step": 8540
2620
+ },
2621
+ {
2622
+ "epoch": 6.09,
2623
+ "learning_rate": 1.6693201429027427e-05,
2624
+ "loss": 1.3155,
2625
+ "step": 8560
2626
+ },
2627
+ {
2628
+ "epoch": 6.1,
2629
+ "learning_rate": 1.65876413023625e-05,
2630
+ "loss": 1.309,
2631
+ "step": 8580
2632
+ },
2633
+ {
2634
+ "epoch": 6.11,
2635
+ "learning_rate": 1.6482250038108852e-05,
2636
+ "loss": 1.2694,
2637
+ "step": 8600
2638
+ },
2639
+ {
2640
+ "epoch": 6.13,
2641
+ "learning_rate": 1.6377029751799554e-05,
2642
+ "loss": 1.3119,
2643
+ "step": 8620
2644
+ },
2645
+ {
2646
+ "epoch": 6.14,
2647
+ "learning_rate": 1.627198255553562e-05,
2648
+ "loss": 1.2952,
2649
+ "step": 8640
2650
+ },
2651
+ {
2652
+ "epoch": 6.16,
2653
+ "learning_rate": 1.6167110557943588e-05,
2654
+ "loss": 1.29,
2655
+ "step": 8660
2656
+ },
2657
+ {
2658
+ "epoch": 6.17,
2659
+ "learning_rate": 1.6062415864133213e-05,
2660
+ "loss": 1.312,
2661
+ "step": 8680
2662
+ },
2663
+ {
2664
+ "epoch": 6.19,
2665
+ "learning_rate": 1.595790057565522e-05,
2666
+ "loss": 1.312,
2667
+ "step": 8700
2668
+ },
2669
+ {
2670
+ "epoch": 6.2,
2671
+ "learning_rate": 1.5853566790459102e-05,
2672
+ "loss": 1.2913,
2673
+ "step": 8720
2674
+ },
2675
+ {
2676
+ "epoch": 6.21,
2677
+ "learning_rate": 1.574941660285098e-05,
2678
+ "loss": 1.3096,
2679
+ "step": 8740
2680
+ },
2681
+ {
2682
+ "epoch": 6.23,
2683
+ "learning_rate": 1.5645452103451657e-05,
2684
+ "loss": 1.2909,
2685
+ "step": 8760
2686
+ },
2687
+ {
2688
+ "epoch": 6.24,
2689
+ "learning_rate": 1.5541675379154548e-05,
2690
+ "loss": 1.3167,
2691
+ "step": 8780
2692
+ },
2693
+ {
2694
+ "epoch": 6.26,
2695
+ "learning_rate": 1.5438088513083826e-05,
2696
+ "loss": 1.2911,
2697
+ "step": 8800
2698
+ },
2699
+ {
2700
+ "epoch": 6.27,
2701
+ "learning_rate": 1.5334693584552655e-05,
2702
+ "loss": 1.277,
2703
+ "step": 8820
2704
+ },
2705
+ {
2706
+ "epoch": 6.29,
2707
+ "learning_rate": 1.523149266902138e-05,
2708
+ "loss": 1.2932,
2709
+ "step": 8840
2710
+ },
2711
+ {
2712
+ "epoch": 6.3,
2713
+ "learning_rate": 1.5128487838055887e-05,
2714
+ "loss": 1.2876,
2715
+ "step": 8860
2716
+ },
2717
+ {
2718
+ "epoch": 6.31,
2719
+ "learning_rate": 1.5025681159286076e-05,
2720
+ "loss": 1.3119,
2721
+ "step": 8880
2722
+ },
2723
+ {
2724
+ "epoch": 6.33,
2725
+ "learning_rate": 1.4923074696364265e-05,
2726
+ "loss": 1.2912,
2727
+ "step": 8900
2728
+ },
2729
+ {
2730
+ "epoch": 6.34,
2731
+ "learning_rate": 1.4820670508923825e-05,
2732
+ "loss": 1.2965,
2733
+ "step": 8920
2734
+ },
2735
+ {
2736
+ "epoch": 6.36,
2737
+ "learning_rate": 1.4718470652537846e-05,
2738
+ "loss": 1.3191,
2739
+ "step": 8940
2740
+ },
2741
+ {
2742
+ "epoch": 6.37,
2743
+ "learning_rate": 1.461647717867783e-05,
2744
+ "loss": 1.3124,
2745
+ "step": 8960
2746
+ },
2747
+ {
2748
+ "epoch": 6.38,
2749
+ "learning_rate": 1.4514692134672523e-05,
2750
+ "loss": 1.3195,
2751
+ "step": 8980
2752
+ },
2753
+ {
2754
+ "epoch": 6.4,
2755
+ "learning_rate": 1.4413117563666873e-05,
2756
+ "loss": 1.2738,
2757
+ "step": 9000
2758
+ },
2759
+ {
2760
+ "epoch": 6.41,
2761
+ "learning_rate": 1.431175550458094e-05,
2762
+ "loss": 1.3316,
2763
+ "step": 9020
2764
+ },
2765
+ {
2766
+ "epoch": 6.43,
2767
+ "learning_rate": 1.4210607992069003e-05,
2768
+ "loss": 1.2999,
2769
+ "step": 9040
2770
+ },
2771
+ {
2772
+ "epoch": 6.44,
2773
+ "learning_rate": 1.4109677056478748e-05,
2774
+ "loss": 1.2916,
2775
+ "step": 9060
2776
+ },
2777
+ {
2778
+ "epoch": 6.46,
2779
+ "learning_rate": 1.4008964723810459e-05,
2780
+ "loss": 1.3161,
2781
+ "step": 9080
2782
+ },
2783
+ {
2784
+ "epoch": 6.47,
2785
+ "learning_rate": 1.3908473015676359e-05,
2786
+ "loss": 1.3043,
2787
+ "step": 9100
2788
+ },
2789
+ {
2790
+ "epoch": 6.48,
2791
+ "learning_rate": 1.3808203949260098e-05,
2792
+ "loss": 1.3031,
2793
+ "step": 9120
2794
+ },
2795
+ {
2796
+ "epoch": 6.5,
2797
+ "learning_rate": 1.3708159537276161e-05,
2798
+ "loss": 1.281,
2799
+ "step": 9140
2800
+ },
2801
+ {
2802
+ "epoch": 6.51,
2803
+ "learning_rate": 1.3608341787929518e-05,
2804
+ "loss": 1.3082,
2805
+ "step": 9160
2806
+ },
2807
+ {
2808
+ "epoch": 6.53,
2809
+ "learning_rate": 1.3508752704875344e-05,
2810
+ "loss": 1.299,
2811
+ "step": 9180
2812
+ },
2813
+ {
2814
+ "epoch": 6.54,
2815
+ "learning_rate": 1.3409394287178727e-05,
2816
+ "loss": 1.3043,
2817
+ "step": 9200
2818
+ },
2819
+ {
2820
+ "epoch": 6.56,
2821
+ "learning_rate": 1.331026852927459e-05,
2822
+ "loss": 1.2931,
2823
+ "step": 9220
2824
+ },
2825
+ {
2826
+ "epoch": 6.57,
2827
+ "learning_rate": 1.3211377420927657e-05,
2828
+ "loss": 1.3138,
2829
+ "step": 9240
2830
+ },
2831
+ {
2832
+ "epoch": 6.58,
2833
+ "learning_rate": 1.311272294719249e-05,
2834
+ "loss": 1.3022,
2835
+ "step": 9260
2836
+ },
2837
+ {
2838
+ "epoch": 6.6,
2839
+ "learning_rate": 1.3014307088373637e-05,
2840
+ "loss": 1.3095,
2841
+ "step": 9280
2842
+ },
2843
+ {
2844
+ "epoch": 6.61,
2845
+ "learning_rate": 1.2916131819985933e-05,
2846
+ "loss": 1.2889,
2847
+ "step": 9300
2848
+ },
2849
+ {
2850
+ "epoch": 6.63,
2851
+ "learning_rate": 1.2818199112714779e-05,
2852
+ "loss": 1.3317,
2853
+ "step": 9320
2854
+ },
2855
+ {
2856
+ "epoch": 6.64,
2857
+ "learning_rate": 1.2720510932376611e-05,
2858
+ "loss": 1.315,
2859
+ "step": 9340
2860
+ },
2861
+ {
2862
+ "epoch": 6.65,
2863
+ "learning_rate": 1.2623069239879476e-05,
2864
+ "loss": 1.2964,
2865
+ "step": 9360
2866
+ },
2867
+ {
2868
+ "epoch": 6.67,
2869
+ "learning_rate": 1.2525875991183606e-05,
2870
+ "loss": 1.3068,
2871
+ "step": 9380
2872
+ },
2873
+ {
2874
+ "epoch": 6.68,
2875
+ "learning_rate": 1.2428933137262196e-05,
2876
+ "loss": 1.2965,
2877
+ "step": 9400
2878
+ },
2879
+ {
2880
+ "epoch": 6.7,
2881
+ "learning_rate": 1.2332242624062225e-05,
2882
+ "loss": 1.2759,
2883
+ "step": 9420
2884
+ },
2885
+ {
2886
+ "epoch": 6.71,
2887
+ "learning_rate": 1.2235806392465435e-05,
2888
+ "loss": 1.3054,
2889
+ "step": 9440
2890
+ },
2891
+ {
2892
+ "epoch": 6.73,
2893
+ "learning_rate": 1.2139626378249299e-05,
2894
+ "loss": 1.2885,
2895
+ "step": 9460
2896
+ },
2897
+ {
2898
+ "epoch": 6.74,
2899
+ "learning_rate": 1.2043704512048217e-05,
2900
+ "loss": 1.3241,
2901
+ "step": 9480
2902
+ },
2903
+ {
2904
+ "epoch": 6.75,
2905
+ "learning_rate": 1.194804271931477e-05,
2906
+ "loss": 1.3053,
2907
+ "step": 9500
2908
+ },
2909
+ {
2910
+ "epoch": 6.77,
2911
+ "learning_rate": 1.1852642920281021e-05,
2912
+ "loss": 1.2822,
2913
+ "step": 9520
2914
+ },
2915
+ {
2916
+ "epoch": 6.78,
2917
+ "learning_rate": 1.1757507029920009e-05,
2918
+ "loss": 1.3165,
2919
+ "step": 9540
2920
+ },
2921
+ {
2922
+ "epoch": 6.8,
2923
+ "learning_rate": 1.1662636957907291e-05,
2924
+ "loss": 1.2796,
2925
+ "step": 9560
2926
+ },
2927
+ {
2928
+ "epoch": 6.81,
2929
+ "learning_rate": 1.1568034608582642e-05,
2930
+ "loss": 1.2647,
2931
+ "step": 9580
2932
+ },
2933
+ {
2934
+ "epoch": 6.83,
2935
+ "learning_rate": 1.1473701880911774e-05,
2936
+ "loss": 1.285,
2937
+ "step": 9600
2938
+ },
2939
+ {
2940
+ "epoch": 6.84,
2941
+ "learning_rate": 1.1379640668448263e-05,
2942
+ "loss": 1.3066,
2943
+ "step": 9620
2944
+ },
2945
+ {
2946
+ "epoch": 6.85,
2947
+ "learning_rate": 1.1285852859295506e-05,
2948
+ "loss": 1.2901,
2949
+ "step": 9640
2950
+ },
2951
+ {
2952
+ "epoch": 6.87,
2953
+ "learning_rate": 1.1192340336068874e-05,
2954
+ "loss": 1.2976,
2955
+ "step": 9660
2956
+ },
2957
+ {
2958
+ "epoch": 6.88,
2959
+ "learning_rate": 1.1099104975857852e-05,
2960
+ "loss": 1.2805,
2961
+ "step": 9680
2962
+ },
2963
+ {
2964
+ "epoch": 6.9,
2965
+ "learning_rate": 1.1006148650188409e-05,
2966
+ "loss": 1.3113,
2967
+ "step": 9700
2968
+ },
2969
+ {
2970
+ "epoch": 6.91,
2971
+ "learning_rate": 1.09134732249854e-05,
2972
+ "loss": 1.3353,
2973
+ "step": 9720
2974
+ },
2975
+ {
2976
+ "epoch": 6.93,
2977
+ "learning_rate": 1.082108056053516e-05,
2978
+ "loss": 1.3107,
2979
+ "step": 9740
2980
+ },
2981
+ {
2982
+ "epoch": 6.94,
2983
+ "learning_rate": 1.0728972511448104e-05,
2984
+ "loss": 1.2788,
2985
+ "step": 9760
2986
+ },
2987
+ {
2988
+ "epoch": 6.95,
2989
+ "learning_rate": 1.063715092662152e-05,
2990
+ "loss": 1.2946,
2991
+ "step": 9780
2992
+ },
2993
+ {
2994
+ "epoch": 6.97,
2995
+ "learning_rate": 1.0545617649202486e-05,
2996
+ "loss": 1.2785,
2997
+ "step": 9800
2998
+ },
2999
+ {
3000
+ "epoch": 6.98,
3001
+ "learning_rate": 1.0454374516550825e-05,
3002
+ "loss": 1.3007,
3003
+ "step": 9820
3004
+ },
3005
+ {
3006
+ "epoch": 7.0,
3007
+ "learning_rate": 1.036342336020224e-05,
3008
+ "loss": 1.2976,
3009
+ "step": 9840
3010
+ },
3011
+ {
3012
+ "epoch": 7.0,
3013
+ "eval_loss": 1.2157546281814575,
3014
+ "eval_runtime": 286.1833,
3015
+ "eval_samples_per_second": 19.439,
3016
+ "eval_steps_per_second": 19.439,
3017
+ "step": 9845
3018
+ },
3019
+ {
3020
+ "epoch": 7.01,
3021
+ "learning_rate": 1.0272766005831583e-05,
3022
+ "loss": 1.2946,
3023
+ "step": 9860
3024
+ },
3025
+ {
3026
+ "epoch": 7.02,
3027
+ "learning_rate": 1.0182404273216154e-05,
3028
+ "loss": 1.3138,
3029
+ "step": 9880
3030
+ },
3031
+ {
3032
+ "epoch": 7.04,
3033
+ "learning_rate": 1.0092339976199192e-05,
3034
+ "loss": 1.2854,
3035
+ "step": 9900
3036
+ },
3037
+ {
3038
+ "epoch": 7.05,
3039
+ "learning_rate": 1.0002574922653506e-05,
3040
+ "loss": 1.3105,
3041
+ "step": 9920
3042
+ },
3043
+ {
3044
+ "epoch": 7.07,
3045
+ "learning_rate": 9.91311091444512e-06,
3046
+ "loss": 1.3052,
3047
+ "step": 9940
3048
+ },
3049
+ {
3050
+ "epoch": 7.08,
3051
+ "learning_rate": 9.823949747397134e-06,
3052
+ "loss": 1.3018,
3053
+ "step": 9960
3054
+ },
3055
+ {
3056
+ "epoch": 7.1,
3057
+ "learning_rate": 9.735093211253698e-06,
3058
+ "loss": 1.3138,
3059
+ "step": 9980
3060
+ },
3061
+ {
3062
+ "epoch": 7.11,
3063
+ "learning_rate": 9.64654308964405e-06,
3064
+ "loss": 1.281,
3065
+ "step": 10000
3066
+ },
3067
+ {
3068
+ "epoch": 7.12,
3069
+ "learning_rate": 9.558301160046717e-06,
3070
+ "loss": 1.2824,
3071
+ "step": 10020
3072
+ },
3073
+ {
3074
+ "epoch": 7.14,
3075
+ "learning_rate": 9.470369193753877e-06,
3076
+ "loss": 1.301,
3077
+ "step": 10040
3078
+ },
3079
+ {
3080
+ "epoch": 7.15,
3081
+ "learning_rate": 9.38274895583575e-06,
3082
+ "loss": 1.2919,
3083
+ "step": 10060
3084
+ },
3085
+ {
3086
+ "epoch": 7.17,
3087
+ "learning_rate": 9.295442205105178e-06,
3088
+ "loss": 1.2813,
3089
+ "step": 10080
3090
+ },
3091
+ {
3092
+ "epoch": 7.18,
3093
+ "learning_rate": 9.208450694082373e-06,
3094
+ "loss": 1.323,
3095
+ "step": 10100
3096
+ },
3097
+ {
3098
+ "epoch": 7.2,
3099
+ "learning_rate": 9.121776168959667e-06,
3100
+ "loss": 1.2836,
3101
+ "step": 10120
3102
+ },
3103
+ {
3104
+ "epoch": 7.21,
3105
+ "learning_rate": 9.035420369566485e-06,
3106
+ "loss": 1.3184,
3107
+ "step": 10140
3108
+ },
3109
+ {
3110
+ "epoch": 7.22,
3111
+ "learning_rate": 8.949385029334459e-06,
3112
+ "loss": 1.2973,
3113
+ "step": 10160
3114
+ },
3115
+ {
3116
+ "epoch": 7.24,
3117
+ "learning_rate": 8.863671875262577e-06,
3118
+ "loss": 1.2943,
3119
+ "step": 10180
3120
+ },
3121
+ {
3122
+ "epoch": 7.25,
3123
+ "learning_rate": 8.778282627882536e-06,
3124
+ "loss": 1.289,
3125
+ "step": 10200
3126
+ },
3127
+ {
3128
+ "epoch": 7.27,
3129
+ "learning_rate": 8.693219001224239e-06,
3130
+ "loss": 1.302,
3131
+ "step": 10220
3132
+ },
3133
+ {
3134
+ "epoch": 7.28,
3135
+ "learning_rate": 8.608482702781332e-06,
3136
+ "loss": 1.2783,
3137
+ "step": 10240
3138
+ },
3139
+ {
3140
+ "epoch": 7.29,
3141
+ "learning_rate": 8.524075433476963e-06,
3142
+ "loss": 1.3028,
3143
+ "step": 10260
3144
+ },
3145
+ {
3146
+ "epoch": 7.31,
3147
+ "learning_rate": 8.439998887629649e-06,
3148
+ "loss": 1.3119,
3149
+ "step": 10280
3150
+ },
3151
+ {
3152
+ "epoch": 7.32,
3153
+ "learning_rate": 8.356254752919241e-06,
3154
+ "loss": 1.3063,
3155
+ "step": 10300
3156
+ },
3157
+ {
3158
+ "epoch": 7.34,
3159
+ "learning_rate": 8.272844710353036e-06,
3160
+ "loss": 1.2968,
3161
+ "step": 10320
3162
+ },
3163
+ {
3164
+ "epoch": 7.35,
3165
+ "learning_rate": 8.189770434232096e-06,
3166
+ "loss": 1.2923,
3167
+ "step": 10340
3168
+ },
3169
+ {
3170
+ "epoch": 7.37,
3171
+ "learning_rate": 8.10703359211757e-06,
3172
+ "loss": 1.2744,
3173
+ "step": 10360
3174
+ },
3175
+ {
3176
+ "epoch": 7.38,
3177
+ "learning_rate": 8.02463584479724e-06,
3178
+ "loss": 1.3022,
3179
+ "step": 10380
3180
+ },
3181
+ {
3182
+ "epoch": 7.39,
3183
+ "learning_rate": 7.942578846252227e-06,
3184
+ "loss": 1.2802,
3185
+ "step": 10400
3186
+ },
3187
+ {
3188
+ "epoch": 7.41,
3189
+ "learning_rate": 7.860864243623726e-06,
3190
+ "loss": 1.2981,
3191
+ "step": 10420
3192
+ },
3193
+ {
3194
+ "epoch": 7.42,
3195
+ "learning_rate": 7.779493677179971e-06,
3196
+ "loss": 1.3192,
3197
+ "step": 10440
3198
+ },
3199
+ {
3200
+ "epoch": 7.44,
3201
+ "learning_rate": 7.698468780283344e-06,
3202
+ "loss": 1.3113,
3203
+ "step": 10460
3204
+ },
3205
+ {
3206
+ "epoch": 7.45,
3207
+ "learning_rate": 7.617791179357522e-06,
3208
+ "loss": 1.2951,
3209
+ "step": 10480
3210
+ },
3211
+ {
3212
+ "epoch": 7.47,
3213
+ "learning_rate": 7.537462493854866e-06,
3214
+ "loss": 1.2936,
3215
+ "step": 10500
3216
+ },
3217
+ {
3218
+ "epoch": 7.48,
3219
+ "learning_rate": 7.457484336223939e-06,
3220
+ "loss": 1.3059,
3221
+ "step": 10520
3222
+ },
3223
+ {
3224
+ "epoch": 7.49,
3225
+ "learning_rate": 7.377858311877081e-06,
3226
+ "loss": 1.2771,
3227
+ "step": 10540
3228
+ },
3229
+ {
3230
+ "epoch": 7.51,
3231
+ "learning_rate": 7.298586019158216e-06,
3232
+ "loss": 1.2919,
3233
+ "step": 10560
3234
+ },
3235
+ {
3236
+ "epoch": 7.52,
3237
+ "learning_rate": 7.219669049310784e-06,
3238
+ "loss": 1.3138,
3239
+ "step": 10580
3240
+ },
3241
+ {
3242
+ "epoch": 7.54,
3243
+ "learning_rate": 7.141108986445768e-06,
3244
+ "loss": 1.3031,
3245
+ "step": 10600
3246
+ },
3247
+ {
3248
+ "epoch": 7.55,
3249
+ "learning_rate": 7.062907407509903e-06,
3250
+ "loss": 1.2819,
3251
+ "step": 10620
3252
+ },
3253
+ {
3254
+ "epoch": 7.57,
3255
+ "learning_rate": 6.985065882254046e-06,
3256
+ "loss": 1.2704,
3257
+ "step": 10640
3258
+ },
3259
+ {
3260
+ "epoch": 7.58,
3261
+ "learning_rate": 6.907585973201633e-06,
3262
+ "loss": 1.2916,
3263
+ "step": 10660
3264
+ },
3265
+ {
3266
+ "epoch": 7.59,
3267
+ "learning_rate": 6.830469235617323e-06,
3268
+ "loss": 1.2754,
3269
+ "step": 10680
3270
+ },
3271
+ {
3272
+ "epoch": 7.61,
3273
+ "learning_rate": 6.7537172174758135e-06,
3274
+ "loss": 1.2972,
3275
+ "step": 10700
3276
+ },
3277
+ {
3278
+ "epoch": 7.62,
3279
+ "learning_rate": 6.677331459430713e-06,
3280
+ "loss": 1.2689,
3281
+ "step": 10720
3282
+ },
3283
+ {
3284
+ "epoch": 7.64,
3285
+ "learning_rate": 6.601313494783648e-06,
3286
+ "loss": 1.3081,
3287
+ "step": 10740
3288
+ },
3289
+ {
3290
+ "epoch": 7.65,
3291
+ "learning_rate": 6.525664849453478e-06,
3292
+ "loss": 1.3015,
3293
+ "step": 10760
3294
+ },
3295
+ {
3296
+ "epoch": 7.66,
3297
+ "learning_rate": 6.450387041945677e-06,
3298
+ "loss": 1.2883,
3299
+ "step": 10780
3300
+ },
3301
+ {
3302
+ "epoch": 7.68,
3303
+ "learning_rate": 6.375481583321829e-06,
3304
+ "loss": 1.3173,
3305
+ "step": 10800
3306
+ },
3307
+ {
3308
+ "epoch": 7.69,
3309
+ "learning_rate": 6.3009499771693156e-06,
3310
+ "loss": 1.2894,
3311
+ "step": 10820
3312
+ },
3313
+ {
3314
+ "epoch": 7.71,
3315
+ "learning_rate": 6.226793719571111e-06,
3316
+ "loss": 1.265,
3317
+ "step": 10840
3318
+ },
3319
+ {
3320
+ "epoch": 7.72,
3321
+ "learning_rate": 6.153014299075799e-06,
3322
+ "loss": 1.3319,
3323
+ "step": 10860
3324
+ },
3325
+ {
3326
+ "epoch": 7.74,
3327
+ "learning_rate": 6.0796131966676324e-06,
3328
+ "loss": 1.2988,
3329
+ "step": 10880
3330
+ },
3331
+ {
3332
+ "epoch": 7.75,
3333
+ "learning_rate": 6.006591885736851e-06,
3334
+ "loss": 1.3037,
3335
+ "step": 10900
3336
+ },
3337
+ {
3338
+ "epoch": 7.76,
3339
+ "learning_rate": 5.9339518320500665e-06,
3340
+ "loss": 1.2874,
3341
+ "step": 10920
3342
+ },
3343
+ {
3344
+ "epoch": 7.78,
3345
+ "learning_rate": 5.861694493720898e-06,
3346
+ "loss": 1.3183,
3347
+ "step": 10940
3348
+ },
3349
+ {
3350
+ "epoch": 7.79,
3351
+ "learning_rate": 5.789821321180639e-06,
3352
+ "loss": 1.2894,
3353
+ "step": 10960
3354
+ },
3355
+ {
3356
+ "epoch": 7.81,
3357
+ "learning_rate": 5.718333757149183e-06,
3358
+ "loss": 1.2751,
3359
+ "step": 10980
3360
+ },
3361
+ {
3362
+ "epoch": 7.82,
3363
+ "learning_rate": 5.647233236606037e-06,
3364
+ "loss": 1.3128,
3365
+ "step": 11000
3366
+ },
3367
+ {
3368
+ "epoch": 7.84,
3369
+ "learning_rate": 5.576521186761563e-06,
3370
+ "loss": 1.2951,
3371
+ "step": 11020
3372
+ },
3373
+ {
3374
+ "epoch": 7.85,
3375
+ "learning_rate": 5.506199027028272e-06,
3376
+ "loss": 1.2995,
3377
+ "step": 11040
3378
+ },
3379
+ {
3380
+ "epoch": 7.86,
3381
+ "learning_rate": 5.436268168992356e-06,
3382
+ "loss": 1.2975,
3383
+ "step": 11060
3384
+ },
3385
+ {
3386
+ "epoch": 7.88,
3387
+ "learning_rate": 5.36673001638538e-06,
3388
+ "loss": 1.2766,
3389
+ "step": 11080
3390
+ },
3391
+ {
3392
+ "epoch": 7.89,
3393
+ "learning_rate": 5.297585965056056e-06,
3394
+ "loss": 1.3,
3395
+ "step": 11100
3396
+ },
3397
+ {
3398
+ "epoch": 7.91,
3399
+ "learning_rate": 5.228837402942252e-06,
3400
+ "loss": 1.2957,
3401
+ "step": 11120
3402
+ },
3403
+ {
3404
+ "epoch": 7.92,
3405
+ "learning_rate": 5.1604857100431445e-06,
3406
+ "loss": 1.321,
3407
+ "step": 11140
3408
+ },
3409
+ {
3410
+ "epoch": 7.93,
3411
+ "learning_rate": 5.092532258391483e-06,
3412
+ "loss": 1.2783,
3413
+ "step": 11160
3414
+ },
3415
+ {
3416
+ "epoch": 7.95,
3417
+ "learning_rate": 5.0249784120260626e-06,
3418
+ "loss": 1.3086,
3419
+ "step": 11180
3420
+ },
3421
+ {
3422
+ "epoch": 7.96,
3423
+ "learning_rate": 4.957825526964371e-06,
3424
+ "loss": 1.3213,
3425
+ "step": 11200
3426
+ },
3427
+ {
3428
+ "epoch": 7.98,
3429
+ "learning_rate": 4.891074951175328e-06,
3430
+ "loss": 1.306,
3431
+ "step": 11220
3432
+ },
3433
+ {
3434
+ "epoch": 7.99,
3435
+ "learning_rate": 4.824728024552239e-06,
3436
+ "loss": 1.3074,
3437
+ "step": 11240
3438
+ },
3439
+ {
3440
+ "epoch": 8.0,
3441
+ "eval_loss": 1.2149200439453125,
3442
+ "eval_runtime": 285.6193,
3443
+ "eval_samples_per_second": 19.477,
3444
+ "eval_steps_per_second": 19.477,
3445
+ "step": 11251
3446
+ },
3447
+ {
3448
+ "epoch": 8.01,
3449
+ "learning_rate": 4.758786078885927e-06,
3450
+ "loss": 1.2998,
3451
+ "step": 11260
3452
+ },
3453
+ {
3454
+ "epoch": 8.02,
3455
+ "learning_rate": 4.69325043783796e-06,
3456
+ "loss": 1.289,
3457
+ "step": 11280
3458
+ },
3459
+ {
3460
+ "epoch": 8.03,
3461
+ "learning_rate": 4.628122416914099e-06,
3462
+ "loss": 1.284,
3463
+ "step": 11300
3464
+ },
3465
+ {
3466
+ "epoch": 8.05,
3467
+ "learning_rate": 4.563403323437909e-06,
3468
+ "loss": 1.2929,
3469
+ "step": 11320
3470
+ },
3471
+ {
3472
+ "epoch": 8.06,
3473
+ "learning_rate": 4.499094456524478e-06,
3474
+ "loss": 1.3024,
3475
+ "step": 11340
3476
+ },
3477
+ {
3478
+ "epoch": 8.08,
3479
+ "learning_rate": 4.435197107054364e-06,
3480
+ "loss": 1.2752,
3481
+ "step": 11360
3482
+ },
3483
+ {
3484
+ "epoch": 8.09,
3485
+ "learning_rate": 4.371712557647698e-06,
3486
+ "loss": 1.294,
3487
+ "step": 11380
3488
+ },
3489
+ {
3490
+ "epoch": 8.11,
3491
+ "learning_rate": 4.308642082638401e-06,
3492
+ "loss": 1.2755,
3493
+ "step": 11400
3494
+ },
3495
+ {
3496
+ "epoch": 8.12,
3497
+ "learning_rate": 4.245986948048619e-06,
3498
+ "loss": 1.2902,
3499
+ "step": 11420
3500
+ },
3501
+ {
3502
+ "epoch": 8.13,
3503
+ "learning_rate": 4.18374841156334e-06,
3504
+ "loss": 1.2984,
3505
+ "step": 11440
3506
+ },
3507
+ {
3508
+ "epoch": 8.15,
3509
+ "learning_rate": 4.121927722505095e-06,
3510
+ "loss": 1.3091,
3511
+ "step": 11460
3512
+ },
3513
+ {
3514
+ "epoch": 8.16,
3515
+ "learning_rate": 4.060526121808916e-06,
3516
+ "loss": 1.2879,
3517
+ "step": 11480
3518
+ },
3519
+ {
3520
+ "epoch": 8.18,
3521
+ "learning_rate": 3.999544841997427e-06,
3522
+ "loss": 1.2826,
3523
+ "step": 11500
3524
+ },
3525
+ {
3526
+ "epoch": 8.19,
3527
+ "learning_rate": 3.938985107156082e-06,
3528
+ "loss": 1.2919,
3529
+ "step": 11520
3530
+ },
3531
+ {
3532
+ "epoch": 8.2,
3533
+ "learning_rate": 3.878848132908605e-06,
3534
+ "loss": 1.3118,
3535
+ "step": 11540
3536
+ },
3537
+ {
3538
+ "epoch": 8.22,
3539
+ "learning_rate": 3.819135126392606e-06,
3540
+ "loss": 1.2758,
3541
+ "step": 11560
3542
+ },
3543
+ {
3544
+ "epoch": 8.23,
3545
+ "learning_rate": 3.7598472862353157e-06,
3546
+ "loss": 1.3126,
3547
+ "step": 11580
3548
+ },
3549
+ {
3550
+ "epoch": 8.25,
3551
+ "learning_rate": 3.700985802529544e-06,
3552
+ "loss": 1.2734,
3553
+ "step": 11600
3554
+ },
3555
+ {
3556
+ "epoch": 8.26,
3557
+ "learning_rate": 3.6425518568098087e-06,
3558
+ "loss": 1.3097,
3559
+ "step": 11620
3560
+ },
3561
+ {
3562
+ "epoch": 8.28,
3563
+ "learning_rate": 3.584546622028581e-06,
3564
+ "loss": 1.2896,
3565
+ "step": 11640
3566
+ },
3567
+ {
3568
+ "epoch": 8.29,
3569
+ "learning_rate": 3.526971262532758e-06,
3570
+ "loss": 1.2976,
3571
+ "step": 11660
3572
+ },
3573
+ {
3574
+ "epoch": 8.3,
3575
+ "learning_rate": 3.4698269340403157e-06,
3576
+ "loss": 1.2882,
3577
+ "step": 11680
3578
+ },
3579
+ {
3580
+ "epoch": 8.32,
3581
+ "learning_rate": 3.4131147836170634e-06,
3582
+ "loss": 1.2946,
3583
+ "step": 11700
3584
+ },
3585
+ {
3586
+ "epoch": 8.33,
3587
+ "learning_rate": 3.356835949653642e-06,
3588
+ "loss": 1.3031,
3589
+ "step": 11720
3590
+ },
3591
+ {
3592
+ "epoch": 8.35,
3593
+ "learning_rate": 3.3009915618426894e-06,
3594
+ "loss": 1.3059,
3595
+ "step": 11740
3596
+ },
3597
+ {
3598
+ "epoch": 8.36,
3599
+ "learning_rate": 3.2455827411561364e-06,
3600
+ "loss": 1.2993,
3601
+ "step": 11760
3602
+ },
3603
+ {
3604
+ "epoch": 8.38,
3605
+ "learning_rate": 3.1906105998227104e-06,
3606
+ "loss": 1.2902,
3607
+ "step": 11780
3608
+ },
3609
+ {
3610
+ "epoch": 8.39,
3611
+ "learning_rate": 3.136076241305633e-06,
3612
+ "loss": 1.2908,
3613
+ "step": 11800
3614
+ },
3615
+ {
3616
+ "epoch": 8.4,
3617
+ "learning_rate": 3.081980760280437e-06,
3618
+ "loss": 1.2843,
3619
+ "step": 11820
3620
+ },
3621
+ {
3622
+ "epoch": 8.42,
3623
+ "learning_rate": 3.0283252426130034e-06,
3624
+ "loss": 1.3316,
3625
+ "step": 11840
3626
+ },
3627
+ {
3628
+ "epoch": 8.43,
3629
+ "learning_rate": 2.9751107653377934e-06,
3630
+ "loss": 1.2902,
3631
+ "step": 11860
3632
+ },
3633
+ {
3634
+ "epoch": 8.45,
3635
+ "learning_rate": 2.9223383966361818e-06,
3636
+ "loss": 1.3192,
3637
+ "step": 11880
3638
+ },
3639
+ {
3640
+ "epoch": 8.46,
3641
+ "learning_rate": 2.870009195815046e-06,
3642
+ "loss": 1.2958,
3643
+ "step": 11900
3644
+ },
3645
+ {
3646
+ "epoch": 8.48,
3647
+ "learning_rate": 2.8181242132854973e-06,
3648
+ "loss": 1.3151,
3649
+ "step": 11920
3650
+ },
3651
+ {
3652
+ "epoch": 8.49,
3653
+ "learning_rate": 2.766684490541796e-06,
3654
+ "loss": 1.2813,
3655
+ "step": 11940
3656
+ },
3657
+ {
3658
+ "epoch": 8.5,
3659
+ "learning_rate": 2.715691060140424e-06,
3660
+ "loss": 1.2775,
3661
+ "step": 11960
3662
+ },
3663
+ {
3664
+ "epoch": 8.52,
3665
+ "learning_rate": 2.665144945679407e-06,
3666
+ "loss": 1.3077,
3667
+ "step": 11980
3668
+ },
3669
+ {
3670
+ "epoch": 8.53,
3671
+ "learning_rate": 2.6150471617777116e-06,
3672
+ "loss": 1.3131,
3673
+ "step": 12000
3674
+ },
3675
+ {
3676
+ "epoch": 8.55,
3677
+ "learning_rate": 2.565398714054917e-06,
3678
+ "loss": 1.3218,
3679
+ "step": 12020
3680
+ },
3681
+ {
3682
+ "epoch": 8.56,
3683
+ "learning_rate": 2.51620059911101e-06,
3684
+ "loss": 1.2741,
3685
+ "step": 12040
3686
+ },
3687
+ {
3688
+ "epoch": 8.57,
3689
+ "learning_rate": 2.4674538045063976e-06,
3690
+ "loss": 1.3002,
3691
+ "step": 12060
3692
+ },
3693
+ {
3694
+ "epoch": 8.59,
3695
+ "learning_rate": 2.4191593087420613e-06,
3696
+ "loss": 1.31,
3697
+ "step": 12080
3698
+ },
3699
+ {
3700
+ "epoch": 8.6,
3701
+ "learning_rate": 2.3713180812399317e-06,
3702
+ "loss": 1.3049,
3703
+ "step": 12100
3704
+ },
3705
+ {
3706
+ "epoch": 8.62,
3707
+ "learning_rate": 2.3239310823234215e-06,
3708
+ "loss": 1.2873,
3709
+ "step": 12120
3710
+ },
3711
+ {
3712
+ "epoch": 8.63,
3713
+ "learning_rate": 2.2769992631981595e-06,
3714
+ "loss": 1.2842,
3715
+ "step": 12140
3716
+ },
3717
+ {
3718
+ "epoch": 8.65,
3719
+ "learning_rate": 2.230523565932882e-06,
3720
+ "loss": 1.345,
3721
+ "step": 12160
3722
+ },
3723
+ {
3724
+ "epoch": 8.66,
3725
+ "learning_rate": 2.1845049234405306e-06,
3726
+ "loss": 1.2938,
3727
+ "step": 12180
3728
+ },
3729
+ {
3730
+ "epoch": 8.67,
3731
+ "learning_rate": 2.1389442594595214e-06,
3732
+ "loss": 1.3317,
3733
+ "step": 12200
3734
+ },
3735
+ {
3736
+ "epoch": 8.69,
3737
+ "learning_rate": 2.093842488535219e-06,
3738
+ "loss": 1.2947,
3739
+ "step": 12220
3740
+ },
3741
+ {
3742
+ "epoch": 8.7,
3743
+ "learning_rate": 2.049200516001554e-06,
3744
+ "loss": 1.3109,
3745
+ "step": 12240
3746
+ },
3747
+ {
3748
+ "epoch": 8.72,
3749
+ "learning_rate": 2.0050192379628656e-06,
3750
+ "loss": 1.2825,
3751
+ "step": 12260
3752
+ },
3753
+ {
3754
+ "epoch": 8.73,
3755
+ "learning_rate": 1.9612995412759016e-06,
3756
+ "loss": 1.313,
3757
+ "step": 12280
3758
+ },
3759
+ {
3760
+ "epoch": 8.75,
3761
+ "learning_rate": 1.9180423035320416e-06,
3762
+ "loss": 1.2819,
3763
+ "step": 12300
3764
+ },
3765
+ {
3766
+ "epoch": 8.76,
3767
+ "learning_rate": 1.875248393039658e-06,
3768
+ "loss": 1.2917,
3769
+ "step": 12320
3770
+ },
3771
+ {
3772
+ "epoch": 8.77,
3773
+ "learning_rate": 1.8329186688066797e-06,
3774
+ "loss": 1.2814,
3775
+ "step": 12340
3776
+ },
3777
+ {
3778
+ "epoch": 8.79,
3779
+ "learning_rate": 1.7910539805233827e-06,
3780
+ "loss": 1.3132,
3781
+ "step": 12360
3782
+ },
3783
+ {
3784
+ "epoch": 8.8,
3785
+ "learning_rate": 1.7496551685453028e-06,
3786
+ "loss": 1.2575,
3787
+ "step": 12380
3788
+ },
3789
+ {
3790
+ "epoch": 8.82,
3791
+ "learning_rate": 1.7087230638763745e-06,
3792
+ "loss": 1.2945,
3793
+ "step": 12400
3794
+ },
3795
+ {
3796
+ "epoch": 8.83,
3797
+ "learning_rate": 1.6682584881522634e-06,
3798
+ "loss": 1.3074,
3799
+ "step": 12420
3800
+ },
3801
+ {
3802
+ "epoch": 8.84,
3803
+ "learning_rate": 1.6282622536238551e-06,
3804
+ "loss": 1.2669,
3805
+ "step": 12440
3806
+ },
3807
+ {
3808
+ "epoch": 8.86,
3809
+ "learning_rate": 1.5887351631409614e-06,
3810
+ "loss": 1.2963,
3811
+ "step": 12460
3812
+ },
3813
+ {
3814
+ "epoch": 8.87,
3815
+ "learning_rate": 1.5496780101362074e-06,
3816
+ "loss": 1.2734,
3817
+ "step": 12480
3818
+ },
3819
+ {
3820
+ "epoch": 8.89,
3821
+ "learning_rate": 1.5110915786090918e-06,
3822
+ "loss": 1.2926,
3823
+ "step": 12500
3824
+ },
3825
+ {
3826
+ "epoch": 8.9,
3827
+ "learning_rate": 1.4729766431102604e-06,
3828
+ "loss": 1.2836,
3829
+ "step": 12520
3830
+ },
3831
+ {
3832
+ "epoch": 8.92,
3833
+ "learning_rate": 1.4353339687259632e-06,
3834
+ "loss": 1.2964,
3835
+ "step": 12540
3836
+ },
3837
+ {
3838
+ "epoch": 8.93,
3839
+ "learning_rate": 1.3981643110626775e-06,
3840
+ "loss": 1.3007,
3841
+ "step": 12560
3842
+ },
3843
+ {
3844
+ "epoch": 8.94,
3845
+ "learning_rate": 1.3614684162319564e-06,
3846
+ "loss": 1.2974,
3847
+ "step": 12580
3848
+ },
3849
+ {
3850
+ "epoch": 8.96,
3851
+ "learning_rate": 1.3252470208354518e-06,
3852
+ "loss": 1.2984,
3853
+ "step": 12600
3854
+ },
3855
+ {
3856
+ "epoch": 8.97,
3857
+ "learning_rate": 1.2895008519501206e-06,
3858
+ "loss": 1.3015,
3859
+ "step": 12620
3860
+ },
3861
+ {
3862
+ "epoch": 8.99,
3863
+ "learning_rate": 1.2542306271136284e-06,
3864
+ "loss": 1.3104,
3865
+ "step": 12640
3866
+ },
3867
+ {
3868
+ "epoch": 9.0,
3869
+ "eval_loss": 1.214709758758545,
3870
+ "eval_runtime": 293.5548,
3871
+ "eval_samples_per_second": 18.95,
3872
+ "eval_steps_per_second": 18.95,
3873
+ "step": 12658
3874
+ },
3875
+ {
3876
+ "epoch": 9.0,
3877
+ "learning_rate": 1.2194370543099659e-06,
3878
+ "loss": 1.3173,
3879
+ "step": 12660
3880
+ },
3881
+ {
3882
+ "epoch": 9.02,
3883
+ "learning_rate": 1.1851208319552109e-06,
3884
+ "loss": 1.3155,
3885
+ "step": 12680
3886
+ },
3887
+ {
3888
+ "epoch": 9.03,
3889
+ "learning_rate": 1.1512826488835227e-06,
3890
+ "loss": 1.2712,
3891
+ "step": 12700
3892
+ },
3893
+ {
3894
+ "epoch": 9.04,
3895
+ "learning_rate": 1.1179231843333248e-06,
3896
+ "loss": 1.2837,
3897
+ "step": 12720
3898
+ },
3899
+ {
3900
+ "epoch": 9.06,
3901
+ "learning_rate": 1.085043107933642e-06,
3902
+ "loss": 1.2944,
3903
+ "step": 12740
3904
+ },
3905
+ {
3906
+ "epoch": 9.07,
3907
+ "learning_rate": 1.0526430796906878e-06,
3908
+ "loss": 1.2903,
3909
+ "step": 12760
3910
+ },
3911
+ {
3912
+ "epoch": 9.09,
3913
+ "learning_rate": 1.0207237499746002e-06,
3914
+ "loss": 1.2911,
3915
+ "step": 12780
3916
+ },
3917
+ {
3918
+ "epoch": 9.1,
3919
+ "learning_rate": 9.892857595063947e-07,
3920
+ "loss": 1.2896,
3921
+ "step": 12800
3922
+ },
3923
+ {
3924
+ "epoch": 9.12,
3925
+ "learning_rate": 9.583297393450929e-07,
3926
+ "loss": 1.3074,
3927
+ "step": 12820
3928
+ },
3929
+ {
3930
+ "epoch": 9.13,
3931
+ "learning_rate": 9.278563108750665e-07,
3932
+ "loss": 1.2877,
3933
+ "step": 12840
3934
+ },
3935
+ {
3936
+ "epoch": 9.14,
3937
+ "learning_rate": 8.978660857935555e-07,
3938
+ "loss": 1.2782,
3939
+ "step": 12860
3940
+ },
3941
+ {
3942
+ "epoch": 9.16,
3943
+ "learning_rate": 8.68359666098395e-07,
3944
+ "loss": 1.3062,
3945
+ "step": 12880
3946
+ },
3947
+ {
3948
+ "epoch": 9.17,
3949
+ "learning_rate": 8.393376440759326e-07,
3950
+ "loss": 1.3011,
3951
+ "step": 12900
3952
+ },
3953
+ {
3954
+ "epoch": 9.19,
3955
+ "learning_rate": 8.108006022891274e-07,
3956
+ "loss": 1.2822,
3957
+ "step": 12920
3958
+ },
3959
+ {
3960
+ "epoch": 9.2,
3961
+ "learning_rate": 7.827491135658726e-07,
3962
+ "loss": 1.3102,
3963
+ "step": 12940
3964
+ },
3965
+ {
3966
+ "epoch": 9.21,
3967
+ "learning_rate": 7.551837409874862e-07,
3968
+ "loss": 1.3167,
3969
+ "step": 12960
3970
+ },
3971
+ {
3972
+ "epoch": 9.23,
3973
+ "learning_rate": 7.281050378774135e-07,
3974
+ "loss": 1.3019,
3975
+ "step": 12980
3976
+ },
3977
+ {
3978
+ "epoch": 9.24,
3979
+ "learning_rate": 7.015135477901086e-07,
3980
+ "loss": 1.316,
3981
+ "step": 13000
3982
+ },
3983
+ {
3984
+ "epoch": 9.26,
3985
+ "learning_rate": 6.754098045001517e-07,
3986
+ "loss": 1.2924,
3987
+ "step": 13020
3988
+ },
3989
+ {
3990
+ "epoch": 9.27,
3991
+ "learning_rate": 6.497943319914962e-07,
3992
+ "loss": 1.2847,
3993
+ "step": 13040
3994
+ },
3995
+ {
3996
+ "epoch": 9.29,
3997
+ "learning_rate": 6.246676444469774e-07,
3998
+ "loss": 1.2744,
3999
+ "step": 13060
4000
+ },
4001
+ {
4002
+ "epoch": 9.3,
4003
+ "learning_rate": 6.000302462379898e-07,
4004
+ "loss": 1.2995,
4005
+ "step": 13080
4006
+ },
4007
+ {
4008
+ "epoch": 9.31,
4009
+ "learning_rate": 5.758826319143512e-07,
4010
+ "loss": 1.3082,
4011
+ "step": 13100
4012
+ },
4013
+ {
4014
+ "epoch": 9.33,
4015
+ "learning_rate": 5.5222528619438e-07,
4016
+ "loss": 1.2896,
4017
+ "step": 13120
4018
+ },
4019
+ {
4020
+ "epoch": 9.34,
4021
+ "learning_rate": 5.29058683955172e-07,
4022
+ "loss": 1.2687,
4023
+ "step": 13140
4024
+ },
4025
+ {
4026
+ "epoch": 9.36,
4027
+ "learning_rate": 5.063832902230586e-07,
4028
+ "loss": 1.3082,
4029
+ "step": 13160
4030
+ },
4031
+ {
4032
+ "epoch": 9.37,
4033
+ "learning_rate": 4.841995601642751e-07,
4034
+ "loss": 1.3213,
4035
+ "step": 13180
4036
+ },
4037
+ {
4038
+ "epoch": 9.39,
4039
+ "learning_rate": 4.625079390758319e-07,
4040
+ "loss": 1.2916,
4041
+ "step": 13200
4042
+ },
4043
+ {
4044
+ "epoch": 9.4,
4045
+ "learning_rate": 4.41308862376566e-07,
4046
+ "loss": 1.2851,
4047
+ "step": 13220
4048
+ },
4049
+ {
4050
+ "epoch": 9.41,
4051
+ "learning_rate": 4.2060275559840377e-07,
4052
+ "loss": 1.3003,
4053
+ "step": 13240
4054
+ },
4055
+ {
4056
+ "epoch": 9.43,
4057
+ "learning_rate": 4.0039003437782055e-07,
4058
+ "loss": 1.28,
4059
+ "step": 13260
4060
+ },
4061
+ {
4062
+ "epoch": 9.44,
4063
+ "learning_rate": 3.80671104447497e-07,
4064
+ "loss": 1.2949,
4065
+ "step": 13280
4066
+ },
4067
+ {
4068
+ "epoch": 9.46,
4069
+ "learning_rate": 3.61446361628176e-07,
4070
+ "loss": 1.278,
4071
+ "step": 13300
4072
+ },
4073
+ {
4074
+ "epoch": 9.47,
4075
+ "learning_rate": 3.427161918207106e-07,
4076
+ "loss": 1.3191,
4077
+ "step": 13320
4078
+ },
4079
+ {
4080
+ "epoch": 9.48,
4081
+ "learning_rate": 3.2448097099833095e-07,
4082
+ "loss": 1.3055,
4083
+ "step": 13340
4084
+ },
4085
+ {
4086
+ "epoch": 9.5,
4087
+ "learning_rate": 3.0674106519908155e-07,
4088
+ "loss": 1.2913,
4089
+ "step": 13360
4090
+ },
4091
+ {
4092
+ "epoch": 9.51,
4093
+ "learning_rate": 2.8949683051848754e-07,
4094
+ "loss": 1.3131,
4095
+ "step": 13380
4096
+ },
4097
+ {
4098
+ "epoch": 9.53,
4099
+ "learning_rate": 2.727486131023971e-07,
4100
+ "loss": 1.2956,
4101
+ "step": 13400
4102
+ },
4103
+ {
4104
+ "epoch": 9.54,
4105
+ "learning_rate": 2.564967491400394e-07,
4106
+ "loss": 1.2721,
4107
+ "step": 13420
4108
+ },
4109
+ {
4110
+ "epoch": 9.56,
4111
+ "learning_rate": 2.4074156485727197e-07,
4112
+ "loss": 1.3024,
4113
+ "step": 13440
4114
+ },
4115
+ {
4116
+ "epoch": 9.57,
4117
+ "learning_rate": 2.2548337651003837e-07,
4118
+ "loss": 1.2989,
4119
+ "step": 13460
4120
+ },
4121
+ {
4122
+ "epoch": 9.58,
4123
+ "learning_rate": 2.1072249037800418e-07,
4124
+ "loss": 1.3151,
4125
+ "step": 13480
4126
+ },
4127
+ {
4128
+ "epoch": 9.6,
4129
+ "learning_rate": 1.9645920275843943e-07,
4130
+ "loss": 1.2864,
4131
+ "step": 13500
4132
+ },
4133
+ {
4134
+ "epoch": 9.61,
4135
+ "learning_rate": 1.8269379996023183e-07,
4136
+ "loss": 1.3041,
4137
+ "step": 13520
4138
+ },
4139
+ {
4140
+ "epoch": 9.63,
4141
+ "learning_rate": 1.6942655829817189e-07,
4142
+ "loss": 1.3049,
4143
+ "step": 13540
4144
+ },
4145
+ {
4146
+ "epoch": 9.64,
4147
+ "learning_rate": 1.566577440873962e-07,
4148
+ "loss": 1.2744,
4149
+ "step": 13560
4150
+ },
4151
+ {
4152
+ "epoch": 9.66,
4153
+ "learning_rate": 1.4438761363803067e-07,
4154
+ "loss": 1.2983,
4155
+ "step": 13580
4156
+ },
4157
+ {
4158
+ "epoch": 9.67,
4159
+ "learning_rate": 1.3261641325006124e-07,
4160
+ "loss": 1.3166,
4161
+ "step": 13600
4162
+ },
4163
+ {
4164
+ "epoch": 9.68,
4165
+ "learning_rate": 1.213443792083796e-07,
4166
+ "loss": 1.2968,
4167
+ "step": 13620
4168
+ },
4169
+ {
4170
+ "epoch": 9.7,
4171
+ "learning_rate": 1.1057173777804797e-07,
4172
+ "loss": 1.301,
4173
+ "step": 13640
4174
+ },
4175
+ {
4176
+ "epoch": 9.71,
4177
+ "learning_rate": 1.0029870519975004e-07,
4178
+ "loss": 1.3018,
4179
+ "step": 13660
4180
+ },
4181
+ {
4182
+ "epoch": 9.73,
4183
+ "learning_rate": 9.052548768545832e-08,
4184
+ "loss": 1.3071,
4185
+ "step": 13680
4186
+ },
4187
+ {
4188
+ "epoch": 9.74,
4189
+ "learning_rate": 8.125228141428465e-08,
4190
+ "loss": 1.3095,
4191
+ "step": 13700
4192
+ },
4193
+ {
4194
+ "epoch": 9.75,
4195
+ "learning_rate": 7.247927252854725e-08,
4196
+ "loss": 1.3138,
4197
+ "step": 13720
4198
+ },
4199
+ {
4200
+ "epoch": 9.77,
4201
+ "learning_rate": 6.420663713004038e-08,
4202
+ "loss": 1.3334,
4203
+ "step": 13740
4204
+ },
4205
+ {
4206
+ "epoch": 9.78,
4207
+ "learning_rate": 5.643454127648995e-08,
4208
+ "loss": 1.3292,
4209
+ "step": 13760
4210
+ },
4211
+ {
4212
+ "epoch": 9.8,
4213
+ "learning_rate": 4.9163140978225605e-08,
4214
+ "loss": 1.2938,
4215
+ "step": 13780
4216
+ },
4217
+ {
4218
+ "epoch": 9.81,
4219
+ "learning_rate": 4.239258219504716e-08,
4220
+ "loss": 1.2944,
4221
+ "step": 13800
4222
+ },
4223
+ {
4224
+ "epoch": 9.83,
4225
+ "learning_rate": 3.612300083329079e-08,
4226
+ "loss": 1.3012,
4227
+ "step": 13820
4228
+ },
4229
+ {
4230
+ "epoch": 9.84,
4231
+ "learning_rate": 3.035452274311457e-08,
4232
+ "loss": 1.2832,
4233
+ "step": 13840
4234
+ },
4235
+ {
4236
+ "epoch": 9.85,
4237
+ "learning_rate": 2.5087263715953268e-08,
4238
+ "loss": 1.2809,
4239
+ "step": 13860
4240
+ },
4241
+ {
4242
+ "epoch": 9.87,
4243
+ "learning_rate": 2.0321329482209107e-08,
4244
+ "loss": 1.2784,
4245
+ "step": 13880
4246
+ },
4247
+ {
4248
+ "epoch": 9.88,
4249
+ "learning_rate": 1.605681570912565e-08,
4250
+ "loss": 1.2963,
4251
+ "step": 13900
4252
+ },
4253
+ {
4254
+ "epoch": 9.9,
4255
+ "learning_rate": 1.2293807998858819e-08,
4256
+ "loss": 1.2885,
4257
+ "step": 13920
4258
+ },
4259
+ {
4260
+ "epoch": 9.91,
4261
+ "learning_rate": 9.03238188677269e-09,
4262
+ "loss": 1.2976,
4263
+ "step": 13940
4264
+ },
4265
+ {
4266
+ "epoch": 9.93,
4267
+ "learning_rate": 6.272602839915709e-09,
4268
+ "loss": 1.2775,
4269
+ "step": 13960
4270
+ },
4271
+ {
4272
+ "epoch": 9.94,
4273
+ "learning_rate": 4.014526255702311e-09,
4274
+ "loss": 1.2798,
4275
+ "step": 13980
4276
+ },
4277
+ {
4278
+ "epoch": 9.95,
4279
+ "learning_rate": 2.2581974608082425e-09,
4280
+ "loss": 1.2866,
4281
+ "step": 14000
4282
+ },
4283
+ {
4284
+ "epoch": 9.97,
4285
+ "learning_rate": 1.0036517102601784e-09,
4286
+ "loss": 1.2933,
4287
+ "step": 14020
4288
+ },
4289
+ {
4290
+ "epoch": 9.98,
4291
+ "learning_rate": 2.509141867224063e-10,
4292
+ "loss": 1.2835,
4293
+ "step": 14040
4294
+ },
4295
+ {
4296
+ "epoch": 10.0,
4297
+ "learning_rate": 0.0,
4298
+ "loss": 1.2806,
4299
+ "step": 14060
4300
+ },
4301
+ {
4302
+ "epoch": 10.0,
4303
+ "eval_loss": 1.2147753238677979,
4304
+ "eval_runtime": 291.9831,
4305
+ "eval_samples_per_second": 19.052,
4306
+ "eval_steps_per_second": 19.052,
4307
+ "step": 14060
4308
+ },
4309
+ {
4310
+ "epoch": 10.0,
4311
+ "step": 14060,
4312
+ "total_flos": 2.1277696239721021e+18,
4313
+ "train_loss": 1.371388260229553,
4314
+ "train_runtime": 75147.4617,
4315
+ "train_samples_per_second": 5.989,
4316
+ "train_steps_per_second": 0.187
4317
+ }
4318
+ ],
4319
+ "logging_steps": 20,
4320
+ "max_steps": 14060,
4321
+ "num_input_tokens_seen": 0,
4322
+ "num_train_epochs": 10,
4323
+ "save_steps": 500,
4324
+ "total_flos": 2.1277696239721021e+18,
4325
+ "train_batch_size": 1,
4326
+ "trial_name": null,
4327
+ "trial_params": null
4328
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b056034e9f969ac0dbc9c732f50e63e08bcf530fa1bdc58ecfc2e1758ca669df
3
+ size 4792