cella110n commited on
Commit
6a2d6a5
1 Parent(s): ed2839d

Upload 9 files

Browse files
LICENSE ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Tongyi Qianwen LICENSE AGREEMENT
2
+
3
+ Tongyi Qianwen Release Date: August 23, 2023
4
+
5
+ By clicking to agree or by using or distributing any portion or element of the Tongyi Qianwen Materials, you will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately.
6
+
7
+ 1. Definitions
8
+ a. This Tongyi Qianwen LICENSE AGREEMENT (this "Agreement") shall mean the terms and conditions for use, reproduction, distribution and modification of the Materials as defined by this Agreement.
9
+ b. "We"(or "Us") shall mean Alibaba Cloud.
10
+ c. "You" (or "Your") shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Materials for any purpose and in any field of use.
11
+ d. "Third Parties" shall mean individuals or legal entities that are not under common control with Us or You.
12
+ e. "Tongyi Qianwen" shall mean the large language models (including Qwen-VL model and Qwen-VL-Chat model), and software and algorithms, consisting of trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Us.
13
+ f. "Materials" shall mean, collectively, Alibaba Cloud's proprietary Tongyi Qianwen and Documentation (and any portion thereof) made available under this Agreement.
14
+ g. "Source" form shall mean the preferred form for making modifications, including but not limited to model source code, documentation source, and configuration files.
15
+ h. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation,
16
+ and conversions to other media types.
17
+
18
+ 2. Grant of Rights
19
+ You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Alibaba Cloud's intellectual property or other rights owned by Us embodied in the Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Materials.
20
+
21
+ 3. Redistribution
22
+ You may reproduce and distribute copies of the Materials or derivative works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
23
+ a. You shall give any other recipients of the Materials or derivative works a copy of this Agreement;
24
+ b. You shall cause any modified files to carry prominent notices stating that You changed the files;
25
+ c. You shall retain in all copies of the Materials that You distribute the following attribution notices within a "Notice" text file distributed as a part of such copies: "Tongyi Qianwen is licensed under the Tongyi Qianwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved."; and
26
+ d. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such derivative works as a whole, provided Your use, reproduction, and distribution of the work otherwise complies with the terms and conditions of this Agreement.
27
+
28
+ 4. Restrictions
29
+ If you are commercially using the Materials, and your product or service has more than 100 million monthly active users, You shall request a license from Us. You cannot exercise your rights under this Agreement without our express authorization.
30
+
31
+ 5. Rules of use
32
+ a. The Materials may be subject to export controls or restrictions in China, the United States or other countries or regions. You shall comply with applicable laws and regulations in your use of the Materials.
33
+ b. You can not use the Materials or any output therefrom to improve any other large language model (excluding Tongyi Qianwen or derivative works thereof).
34
+
35
+ 6. Intellectual Property
36
+ a. We retain ownership of all intellectual property rights in and to the Materials and derivatives made by or for Us. Conditioned upon compliance with the terms and conditions of this Agreement, with respect to any derivative works and modifications of the Materials that are made by you, you are and will be the owner of such derivative works and modifications.
37
+ b. No trademark license is granted to use the trade names, trademarks, service marks, or product names of Us, except as required to fulfill notice requirements under this Agreement or as required for reasonable and customary use in describing and redistributing the Materials.
38
+ c. If you commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any entity alleging that the Materials or any output therefrom, or any part of the foregoing, infringe any intellectual property or other right owned or licensable by you, then all licences granted to you under this Agreement shall terminate as of the date such lawsuit or other proceeding is commenced or brought.
39
+
40
+ 7. Disclaimer of Warranty and Limitation of Liability
41
+
42
+ a. We are not obligated to support, update, provide training for, or develop any further version of the Tongyi Qianwen Materials or to grant any license thereto.
43
+ b. THE MATERIALS ARE PROVIDED "AS IS" WITHOUT ANY EXPRESS OR IMPLIED WARRANTY OF ANY KIND INCLUDING WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, OR FITNESS FOR A PARTICULAR PURPOSE. WE MAKE NO WARRANTY AND ASSUME NO RESPONSIBILITY FOR THE SAFETY OR STABILITY OF THE MATERIALS AND ANY OUTPUT THEREFROM.
44
+ c. IN NO EVENT SHALL WE BE LIABLE TO YOU FOR ANY DAMAGES, INCLUDING, BUT NOT LIMITED TO ANY DIRECT, OR INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING FROM YOUR USE OR INABILITY TO USE THE MATERIALS OR ANY OUTPUT OF IT, NO MATTER HOW IT’S CAUSED.
45
+ d. You will defend, indemnify and hold harmless Us from and against any claim by any third party arising out of or related to your use or distribution of the Materials.
46
+
47
+ 8. Survival and Termination.
48
+ a. The term of this Agreement shall commence upon your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein.
49
+ b. We may terminate this Agreement if you breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, you must delete and cease use of the Materials. Sections 7 and 9 shall survive the termination of this Agreement.
50
+
51
+ 9. Governing Law and Jurisdiction.
52
+ a. This Agreement and any dispute arising out of or relating to it will be governed by the laws of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
53
+ b. The People's Courts in Hangzhou City shall have exclusive jurisdiction over any dispute arising out of this Agreement.
README.md CHANGED
@@ -1,3 +1,204 @@
1
  ---
2
- license: unknown
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: peft
3
+ base_model: Qwen/Qwen-VL-Chat
4
  ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+
201
+
202
+ ### Framework versions
203
+
204
+ - PEFT 0.7.1
adapter_config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "Qwen/Qwen-VL-Chat",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layers_pattern": null,
10
+ "layers_to_transform": null,
11
+ "loftq_config": {},
12
+ "lora_alpha": 16,
13
+ "lora_dropout": 0.05,
14
+ "megatron_config": null,
15
+ "megatron_core": "megatron.core",
16
+ "modules_to_save": null,
17
+ "peft_type": "LORA",
18
+ "r": 64,
19
+ "rank_pattern": {},
20
+ "revision": null,
21
+ "target_modules": [
22
+ "c_attn",
23
+ "w2",
24
+ "attn.c_proj",
25
+ "w1"
26
+ ],
27
+ "task_type": "CAUSAL_LM"
28
+ }
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8166d6394787d2b8bac1cf4eef06c3d67bfe8b7d87208ee21cfe17a2d7af3da
3
+ size 224483018
qwen.tiktoken ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "pad_token": "<|endoftext|>"
3
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "auto_map": {
3
+ "AutoTokenizer": [
4
+ "Qwen/Qwen-VL-Chat--tokenization_qwen.QWenTokenizer",
5
+ null
6
+ ]
7
+ },
8
+ "clean_up_tokenization_spaces": true,
9
+ "model_max_length": 2048,
10
+ "padding_side": "right",
11
+ "tokenizer_class": "QWenTokenizer"
12
+ }
trainer_state.json ADDED
@@ -0,0 +1,2338 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 4.943820224719101,
5
+ "eval_steps": 500,
6
+ "global_step": 385,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.01,
13
+ "learning_rate": 2.5e-06,
14
+ "loss": 2.3199,
15
+ "step": 1
16
+ },
17
+ {
18
+ "epoch": 0.03,
19
+ "learning_rate": 5e-06,
20
+ "loss": 2.6164,
21
+ "step": 2
22
+ },
23
+ {
24
+ "epoch": 0.04,
25
+ "learning_rate": 7.500000000000001e-06,
26
+ "loss": 2.7792,
27
+ "step": 3
28
+ },
29
+ {
30
+ "epoch": 0.05,
31
+ "learning_rate": 1e-05,
32
+ "loss": 2.6204,
33
+ "step": 4
34
+ },
35
+ {
36
+ "epoch": 0.06,
37
+ "learning_rate": 9.999830024102874e-06,
38
+ "loss": 2.3813,
39
+ "step": 5
40
+ },
41
+ {
42
+ "epoch": 0.08,
43
+ "learning_rate": 9.99932010796822e-06,
44
+ "loss": 2.5856,
45
+ "step": 6
46
+ },
47
+ {
48
+ "epoch": 0.09,
49
+ "learning_rate": 9.998470286265415e-06,
50
+ "loss": 2.3747,
51
+ "step": 7
52
+ },
53
+ {
54
+ "epoch": 0.1,
55
+ "learning_rate": 9.997280616774147e-06,
56
+ "loss": 2.5818,
57
+ "step": 8
58
+ },
59
+ {
60
+ "epoch": 0.12,
61
+ "learning_rate": 9.995751180380468e-06,
62
+ "loss": 2.5867,
63
+ "step": 9
64
+ },
65
+ {
66
+ "epoch": 0.13,
67
+ "learning_rate": 9.993882081071307e-06,
68
+ "loss": 2.2268,
69
+ "step": 10
70
+ },
71
+ {
72
+ "epoch": 0.14,
73
+ "learning_rate": 9.991673445927399e-06,
74
+ "loss": 2.6059,
75
+ "step": 11
76
+ },
77
+ {
78
+ "epoch": 0.15,
79
+ "learning_rate": 9.989125425114639e-06,
80
+ "loss": 2.0894,
81
+ "step": 12
82
+ },
83
+ {
84
+ "epoch": 0.17,
85
+ "learning_rate": 9.986238191873874e-06,
86
+ "loss": 2.2201,
87
+ "step": 13
88
+ },
89
+ {
90
+ "epoch": 0.18,
91
+ "learning_rate": 9.983011942509131e-06,
92
+ "loss": 2.316,
93
+ "step": 14
94
+ },
95
+ {
96
+ "epoch": 0.19,
97
+ "learning_rate": 9.979446896374264e-06,
98
+ "loss": 2.6564,
99
+ "step": 15
100
+ },
101
+ {
102
+ "epoch": 0.21,
103
+ "learning_rate": 9.975543295858035e-06,
104
+ "loss": 2.465,
105
+ "step": 16
106
+ },
107
+ {
108
+ "epoch": 0.22,
109
+ "learning_rate": 9.971301406367644e-06,
110
+ "loss": 2.4582,
111
+ "step": 17
112
+ },
113
+ {
114
+ "epoch": 0.23,
115
+ "learning_rate": 9.966721516310683e-06,
116
+ "loss": 2.2573,
117
+ "step": 18
118
+ },
119
+ {
120
+ "epoch": 0.24,
121
+ "learning_rate": 9.961803937075516e-06,
122
+ "loss": 2.3717,
123
+ "step": 19
124
+ },
125
+ {
126
+ "epoch": 0.26,
127
+ "learning_rate": 9.956549003010122e-06,
128
+ "loss": 2.6225,
129
+ "step": 20
130
+ },
131
+ {
132
+ "epoch": 0.27,
133
+ "learning_rate": 9.950957071399357e-06,
134
+ "loss": 2.0773,
135
+ "step": 21
136
+ },
137
+ {
138
+ "epoch": 0.28,
139
+ "learning_rate": 9.945028522440654e-06,
140
+ "loss": 2.2186,
141
+ "step": 22
142
+ },
143
+ {
144
+ "epoch": 0.3,
145
+ "learning_rate": 9.938763759218186e-06,
146
+ "loss": 2.7806,
147
+ "step": 23
148
+ },
149
+ {
150
+ "epoch": 0.31,
151
+ "learning_rate": 9.93216320767545e-06,
152
+ "loss": 2.5508,
153
+ "step": 24
154
+ },
155
+ {
156
+ "epoch": 0.32,
157
+ "learning_rate": 9.925227316586316e-06,
158
+ "loss": 2.2016,
159
+ "step": 25
160
+ },
161
+ {
162
+ "epoch": 0.33,
163
+ "learning_rate": 9.917956557524511e-06,
164
+ "loss": 2.2435,
165
+ "step": 26
166
+ },
167
+ {
168
+ "epoch": 0.35,
169
+ "learning_rate": 9.910351424831545e-06,
170
+ "loss": 2.5108,
171
+ "step": 27
172
+ },
173
+ {
174
+ "epoch": 0.36,
175
+ "learning_rate": 9.902412435583127e-06,
176
+ "loss": 2.2933,
177
+ "step": 28
178
+ },
179
+ {
180
+ "epoch": 0.37,
181
+ "learning_rate": 9.89414012955398e-06,
182
+ "loss": 2.1846,
183
+ "step": 29
184
+ },
185
+ {
186
+ "epoch": 0.39,
187
+ "learning_rate": 9.885535069181163e-06,
188
+ "loss": 2.9822,
189
+ "step": 30
190
+ },
191
+ {
192
+ "epoch": 0.4,
193
+ "learning_rate": 9.876597839525814e-06,
194
+ "loss": 2.231,
195
+ "step": 31
196
+ },
197
+ {
198
+ "epoch": 0.41,
199
+ "learning_rate": 9.867329048233387e-06,
200
+ "loss": 2.2585,
201
+ "step": 32
202
+ },
203
+ {
204
+ "epoch": 0.42,
205
+ "learning_rate": 9.857729325492329e-06,
206
+ "loss": 2.3573,
207
+ "step": 33
208
+ },
209
+ {
210
+ "epoch": 0.44,
211
+ "learning_rate": 9.847799323991234e-06,
212
+ "loss": 2.2582,
213
+ "step": 34
214
+ },
215
+ {
216
+ "epoch": 0.45,
217
+ "learning_rate": 9.837539718874466e-06,
218
+ "loss": 2.2804,
219
+ "step": 35
220
+ },
221
+ {
222
+ "epoch": 0.46,
223
+ "learning_rate": 9.826951207696258e-06,
224
+ "loss": 2.864,
225
+ "step": 36
226
+ },
227
+ {
228
+ "epoch": 0.48,
229
+ "learning_rate": 9.816034510373287e-06,
230
+ "loss": 2.1242,
231
+ "step": 37
232
+ },
233
+ {
234
+ "epoch": 0.49,
235
+ "learning_rate": 9.804790369135719e-06,
236
+ "loss": 2.2738,
237
+ "step": 38
238
+ },
239
+ {
240
+ "epoch": 0.5,
241
+ "learning_rate": 9.793219548476754e-06,
242
+ "loss": 2.6035,
243
+ "step": 39
244
+ },
245
+ {
246
+ "epoch": 0.51,
247
+ "learning_rate": 9.781322835100639e-06,
248
+ "loss": 2.1281,
249
+ "step": 40
250
+ },
251
+ {
252
+ "epoch": 0.53,
253
+ "learning_rate": 9.769101037869187e-06,
254
+ "loss": 2.392,
255
+ "step": 41
256
+ },
257
+ {
258
+ "epoch": 0.54,
259
+ "learning_rate": 9.756554987746777e-06,
260
+ "loss": 2.2579,
261
+ "step": 42
262
+ },
263
+ {
264
+ "epoch": 0.55,
265
+ "learning_rate": 9.743685537743856e-06,
266
+ "loss": 2.1741,
267
+ "step": 43
268
+ },
269
+ {
270
+ "epoch": 0.57,
271
+ "learning_rate": 9.730493562858954e-06,
272
+ "loss": 2.2772,
273
+ "step": 44
274
+ },
275
+ {
276
+ "epoch": 0.58,
277
+ "learning_rate": 9.716979960019173e-06,
278
+ "loss": 2.2809,
279
+ "step": 45
280
+ },
281
+ {
282
+ "epoch": 0.59,
283
+ "learning_rate": 9.70314564801922e-06,
284
+ "loss": 2.3821,
285
+ "step": 46
286
+ },
287
+ {
288
+ "epoch": 0.6,
289
+ "learning_rate": 9.688991567458934e-06,
290
+ "loss": 2.5027,
291
+ "step": 47
292
+ },
293
+ {
294
+ "epoch": 0.62,
295
+ "learning_rate": 9.67451868067933e-06,
296
+ "loss": 2.3994,
297
+ "step": 48
298
+ },
299
+ {
300
+ "epoch": 0.63,
301
+ "learning_rate": 9.659727971697173e-06,
302
+ "loss": 2.3403,
303
+ "step": 49
304
+ },
305
+ {
306
+ "epoch": 0.64,
307
+ "learning_rate": 9.644620446138078e-06,
308
+ "loss": 2.2843,
309
+ "step": 50
310
+ },
311
+ {
312
+ "epoch": 0.65,
313
+ "learning_rate": 9.629197131168125e-06,
314
+ "loss": 1.9102,
315
+ "step": 51
316
+ },
317
+ {
318
+ "epoch": 0.67,
319
+ "learning_rate": 9.613459075424033e-06,
320
+ "loss": 2.0563,
321
+ "step": 52
322
+ },
323
+ {
324
+ "epoch": 0.68,
325
+ "learning_rate": 9.597407348941865e-06,
326
+ "loss": 2.5812,
327
+ "step": 53
328
+ },
329
+ {
330
+ "epoch": 0.69,
331
+ "learning_rate": 9.58104304308426e-06,
332
+ "loss": 2.0869,
333
+ "step": 54
334
+ },
335
+ {
336
+ "epoch": 0.71,
337
+ "learning_rate": 9.564367270466247e-06,
338
+ "loss": 1.9037,
339
+ "step": 55
340
+ },
341
+ {
342
+ "epoch": 0.72,
343
+ "learning_rate": 9.54738116487959e-06,
344
+ "loss": 2.3678,
345
+ "step": 56
346
+ },
347
+ {
348
+ "epoch": 0.73,
349
+ "learning_rate": 9.530085881215705e-06,
350
+ "loss": 2.2848,
351
+ "step": 57
352
+ },
353
+ {
354
+ "epoch": 0.74,
355
+ "learning_rate": 9.512482595387131e-06,
356
+ "loss": 2.0607,
357
+ "step": 58
358
+ },
359
+ {
360
+ "epoch": 0.76,
361
+ "learning_rate": 9.494572504247593e-06,
362
+ "loss": 1.956,
363
+ "step": 59
364
+ },
365
+ {
366
+ "epoch": 0.77,
367
+ "learning_rate": 9.476356825510613e-06,
368
+ "loss": 2.2119,
369
+ "step": 60
370
+ },
371
+ {
372
+ "epoch": 0.78,
373
+ "learning_rate": 9.457836797666722e-06,
374
+ "loss": 2.2242,
375
+ "step": 61
376
+ },
377
+ {
378
+ "epoch": 0.8,
379
+ "learning_rate": 9.439013679899263e-06,
380
+ "loss": 2.259,
381
+ "step": 62
382
+ },
383
+ {
384
+ "epoch": 0.81,
385
+ "learning_rate": 9.419888751998768e-06,
386
+ "loss": 2.2482,
387
+ "step": 63
388
+ },
389
+ {
390
+ "epoch": 0.82,
391
+ "learning_rate": 9.400463314275942e-06,
392
+ "loss": 2.0579,
393
+ "step": 64
394
+ },
395
+ {
396
+ "epoch": 0.83,
397
+ "learning_rate": 9.380738687473274e-06,
398
+ "loss": 1.9384,
399
+ "step": 65
400
+ },
401
+ {
402
+ "epoch": 0.85,
403
+ "learning_rate": 9.360716212675213e-06,
404
+ "loss": 2.1505,
405
+ "step": 66
406
+ },
407
+ {
408
+ "epoch": 0.86,
409
+ "learning_rate": 9.340397251217009e-06,
410
+ "loss": 2.0986,
411
+ "step": 67
412
+ },
413
+ {
414
+ "epoch": 0.87,
415
+ "learning_rate": 9.319783184592142e-06,
416
+ "loss": 2.3174,
417
+ "step": 68
418
+ },
419
+ {
420
+ "epoch": 0.89,
421
+ "learning_rate": 9.298875414358399e-06,
422
+ "loss": 2.1534,
423
+ "step": 69
424
+ },
425
+ {
426
+ "epoch": 0.9,
427
+ "learning_rate": 9.27767536204258e-06,
428
+ "loss": 2.1114,
429
+ "step": 70
430
+ },
431
+ {
432
+ "epoch": 0.91,
433
+ "learning_rate": 9.256184469043852e-06,
434
+ "loss": 1.9863,
435
+ "step": 71
436
+ },
437
+ {
438
+ "epoch": 0.92,
439
+ "learning_rate": 9.23440419653574e-06,
440
+ "loss": 2.137,
441
+ "step": 72
442
+ },
443
+ {
444
+ "epoch": 0.94,
445
+ "learning_rate": 9.212336025366789e-06,
446
+ "loss": 2.225,
447
+ "step": 73
448
+ },
449
+ {
450
+ "epoch": 0.95,
451
+ "learning_rate": 9.189981455959873e-06,
452
+ "loss": 2.0334,
453
+ "step": 74
454
+ },
455
+ {
456
+ "epoch": 0.96,
457
+ "learning_rate": 9.167342008210191e-06,
458
+ "loss": 1.9056,
459
+ "step": 75
460
+ },
461
+ {
462
+ "epoch": 0.98,
463
+ "learning_rate": 9.144419221381919e-06,
464
+ "loss": 2.144,
465
+ "step": 76
466
+ },
467
+ {
468
+ "epoch": 0.99,
469
+ "learning_rate": 9.121214654003561e-06,
470
+ "loss": 1.8992,
471
+ "step": 77
472
+ },
473
+ {
474
+ "epoch": 1.0,
475
+ "learning_rate": 9.097729883761977e-06,
476
+ "loss": 1.8587,
477
+ "step": 78
478
+ },
479
+ {
480
+ "epoch": 1.01,
481
+ "learning_rate": 9.073966507395123e-06,
482
+ "loss": 2.5454,
483
+ "step": 79
484
+ },
485
+ {
486
+ "epoch": 1.03,
487
+ "learning_rate": 9.049926140583487e-06,
488
+ "loss": 2.0381,
489
+ "step": 80
490
+ },
491
+ {
492
+ "epoch": 1.04,
493
+ "learning_rate": 9.025610417840238e-06,
494
+ "loss": 2.0646,
495
+ "step": 81
496
+ },
497
+ {
498
+ "epoch": 1.05,
499
+ "learning_rate": 9.001020992400086e-06,
500
+ "loss": 1.6412,
501
+ "step": 82
502
+ },
503
+ {
504
+ "epoch": 1.07,
505
+ "learning_rate": 8.976159536106895e-06,
506
+ "loss": 1.9907,
507
+ "step": 83
508
+ },
509
+ {
510
+ "epoch": 1.08,
511
+ "learning_rate": 8.951027739299996e-06,
512
+ "loss": 1.8317,
513
+ "step": 84
514
+ },
515
+ {
516
+ "epoch": 1.09,
517
+ "learning_rate": 8.925627310699275e-06,
518
+ "loss": 2.1631,
519
+ "step": 85
520
+ },
521
+ {
522
+ "epoch": 1.1,
523
+ "learning_rate": 8.899959977288988e-06,
524
+ "loss": 2.2149,
525
+ "step": 86
526
+ },
527
+ {
528
+ "epoch": 1.12,
529
+ "learning_rate": 8.874027484200342e-06,
530
+ "loss": 2.2374,
531
+ "step": 87
532
+ },
533
+ {
534
+ "epoch": 1.13,
535
+ "learning_rate": 8.847831594592851e-06,
536
+ "loss": 2.124,
537
+ "step": 88
538
+ },
539
+ {
540
+ "epoch": 1.14,
541
+ "learning_rate": 8.821374089534446e-06,
542
+ "loss": 2.0088,
543
+ "step": 89
544
+ },
545
+ {
546
+ "epoch": 1.16,
547
+ "learning_rate": 8.794656767880394e-06,
548
+ "loss": 2.1843,
549
+ "step": 90
550
+ },
551
+ {
552
+ "epoch": 1.17,
553
+ "learning_rate": 8.767681446150977e-06,
554
+ "loss": 2.552,
555
+ "step": 91
556
+ },
557
+ {
558
+ "epoch": 1.18,
559
+ "learning_rate": 8.740449958408006e-06,
560
+ "loss": 1.7471,
561
+ "step": 92
562
+ },
563
+ {
564
+ "epoch": 1.19,
565
+ "learning_rate": 8.7129641561301e-06,
566
+ "loss": 1.9054,
567
+ "step": 93
568
+ },
569
+ {
570
+ "epoch": 1.21,
571
+ "learning_rate": 8.68522590808682e-06,
572
+ "loss": 2.1348,
573
+ "step": 94
574
+ },
575
+ {
576
+ "epoch": 1.22,
577
+ "learning_rate": 8.657237100211604e-06,
578
+ "loss": 2.0139,
579
+ "step": 95
580
+ },
581
+ {
582
+ "epoch": 1.23,
583
+ "learning_rate": 8.628999635473547e-06,
584
+ "loss": 1.9181,
585
+ "step": 96
586
+ },
587
+ {
588
+ "epoch": 1.25,
589
+ "learning_rate": 8.600515433748003e-06,
590
+ "loss": 2.2683,
591
+ "step": 97
592
+ },
593
+ {
594
+ "epoch": 1.26,
595
+ "learning_rate": 8.571786431686074e-06,
596
+ "loss": 1.7892,
597
+ "step": 98
598
+ },
599
+ {
600
+ "epoch": 1.27,
601
+ "learning_rate": 8.542814582582917e-06,
602
+ "loss": 1.9259,
603
+ "step": 99
604
+ },
605
+ {
606
+ "epoch": 1.28,
607
+ "learning_rate": 8.513601856244951e-06,
608
+ "loss": 1.9594,
609
+ "step": 100
610
+ },
611
+ {
612
+ "epoch": 1.3,
613
+ "learning_rate": 8.484150238855921e-06,
614
+ "loss": 2.1425,
615
+ "step": 101
616
+ },
617
+ {
618
+ "epoch": 1.31,
619
+ "learning_rate": 8.454461732841864e-06,
620
+ "loss": 2.1687,
621
+ "step": 102
622
+ },
623
+ {
624
+ "epoch": 1.32,
625
+ "learning_rate": 8.424538356734957e-06,
626
+ "loss": 2.128,
627
+ "step": 103
628
+ },
629
+ {
630
+ "epoch": 1.34,
631
+ "learning_rate": 8.394382145036277e-06,
632
+ "loss": 2.3052,
633
+ "step": 104
634
+ },
635
+ {
636
+ "epoch": 1.35,
637
+ "learning_rate": 8.363995148077481e-06,
638
+ "loss": 2.1029,
639
+ "step": 105
640
+ },
641
+ {
642
+ "epoch": 1.36,
643
+ "learning_rate": 8.333379431881398e-06,
644
+ "loss": 1.9932,
645
+ "step": 106
646
+ },
647
+ {
648
+ "epoch": 1.37,
649
+ "learning_rate": 8.302537078021555e-06,
650
+ "loss": 1.9616,
651
+ "step": 107
652
+ },
653
+ {
654
+ "epoch": 1.39,
655
+ "learning_rate": 8.271470183480664e-06,
656
+ "loss": 2.0529,
657
+ "step": 108
658
+ },
659
+ {
660
+ "epoch": 1.4,
661
+ "learning_rate": 8.240180860508027e-06,
662
+ "loss": 1.8921,
663
+ "step": 109
664
+ },
665
+ {
666
+ "epoch": 1.41,
667
+ "learning_rate": 8.208671236475945e-06,
668
+ "loss": 1.7787,
669
+ "step": 110
670
+ },
671
+ {
672
+ "epoch": 1.43,
673
+ "learning_rate": 8.176943453735062e-06,
674
+ "loss": 2.0703,
675
+ "step": 111
676
+ },
677
+ {
678
+ "epoch": 1.44,
679
+ "learning_rate": 8.144999669468714e-06,
680
+ "loss": 1.9988,
681
+ "step": 112
682
+ },
683
+ {
684
+ "epoch": 1.45,
685
+ "learning_rate": 8.112842055546254e-06,
686
+ "loss": 1.8816,
687
+ "step": 113
688
+ },
689
+ {
690
+ "epoch": 1.46,
691
+ "learning_rate": 8.080472798375392e-06,
692
+ "loss": 1.953,
693
+ "step": 114
694
+ },
695
+ {
696
+ "epoch": 1.48,
697
+ "learning_rate": 8.04789409875354e-06,
698
+ "loss": 2.0868,
699
+ "step": 115
700
+ },
701
+ {
702
+ "epoch": 1.49,
703
+ "learning_rate": 8.015108171718177e-06,
704
+ "loss": 2.2089,
705
+ "step": 116
706
+ },
707
+ {
708
+ "epoch": 1.5,
709
+ "learning_rate": 7.982117246396246e-06,
710
+ "loss": 1.9521,
711
+ "step": 117
712
+ },
713
+ {
714
+ "epoch": 1.52,
715
+ "learning_rate": 7.948923565852597e-06,
716
+ "loss": 2.2335,
717
+ "step": 118
718
+ },
719
+ {
720
+ "epoch": 1.53,
721
+ "learning_rate": 7.915529386937486e-06,
722
+ "loss": 2.2536,
723
+ "step": 119
724
+ },
725
+ {
726
+ "epoch": 1.54,
727
+ "learning_rate": 7.881936980133118e-06,
728
+ "loss": 1.6541,
729
+ "step": 120
730
+ },
731
+ {
732
+ "epoch": 1.55,
733
+ "learning_rate": 7.848148629399287e-06,
734
+ "loss": 1.9498,
735
+ "step": 121
736
+ },
737
+ {
738
+ "epoch": 1.57,
739
+ "learning_rate": 7.814166632018083e-06,
740
+ "loss": 2.3166,
741
+ "step": 122
742
+ },
743
+ {
744
+ "epoch": 1.58,
745
+ "learning_rate": 7.779993298437704e-06,
746
+ "loss": 2.1169,
747
+ "step": 123
748
+ },
749
+ {
750
+ "epoch": 1.59,
751
+ "learning_rate": 7.745630952115365e-06,
752
+ "loss": 2.1505,
753
+ "step": 124
754
+ },
755
+ {
756
+ "epoch": 1.61,
757
+ "learning_rate": 7.711081929359316e-06,
758
+ "loss": 1.9007,
759
+ "step": 125
760
+ },
761
+ {
762
+ "epoch": 1.62,
763
+ "learning_rate": 7.67634857917002e-06,
764
+ "loss": 2.1772,
765
+ "step": 126
766
+ },
767
+ {
768
+ "epoch": 1.63,
769
+ "learning_rate": 7.641433263080418e-06,
770
+ "loss": 2.4341,
771
+ "step": 127
772
+ },
773
+ {
774
+ "epoch": 1.64,
775
+ "learning_rate": 7.606338354995381e-06,
776
+ "loss": 1.7081,
777
+ "step": 128
778
+ },
779
+ {
780
+ "epoch": 1.66,
781
+ "learning_rate": 7.571066241030302e-06,
782
+ "loss": 2.0402,
783
+ "step": 129
784
+ },
785
+ {
786
+ "epoch": 1.67,
787
+ "learning_rate": 7.5356193193488655e-06,
788
+ "loss": 2.0681,
789
+ "step": 130
790
+ },
791
+ {
792
+ "epoch": 1.68,
793
+ "learning_rate": 7.500000000000001e-06,
794
+ "loss": 1.5562,
795
+ "step": 131
796
+ },
797
+ {
798
+ "epoch": 1.7,
799
+ "learning_rate": 7.464210704754009e-06,
800
+ "loss": 1.7387,
801
+ "step": 132
802
+ },
803
+ {
804
+ "epoch": 1.71,
805
+ "learning_rate": 7.4282538669379186e-06,
806
+ "loss": 2.1956,
807
+ "step": 133
808
+ },
809
+ {
810
+ "epoch": 1.72,
811
+ "learning_rate": 7.3921319312700365e-06,
812
+ "loss": 2.0449,
813
+ "step": 134
814
+ },
815
+ {
816
+ "epoch": 1.73,
817
+ "learning_rate": 7.355847353693729e-06,
818
+ "loss": 1.8579,
819
+ "step": 135
820
+ },
821
+ {
822
+ "epoch": 1.75,
823
+ "learning_rate": 7.319402601210448e-06,
824
+ "loss": 2.0661,
825
+ "step": 136
826
+ },
827
+ {
828
+ "epoch": 1.76,
829
+ "learning_rate": 7.282800151711991e-06,
830
+ "loss": 2.1408,
831
+ "step": 137
832
+ },
833
+ {
834
+ "epoch": 1.77,
835
+ "learning_rate": 7.246042493812036e-06,
836
+ "loss": 1.729,
837
+ "step": 138
838
+ },
839
+ {
840
+ "epoch": 1.78,
841
+ "learning_rate": 7.209132126676934e-06,
842
+ "loss": 2.3547,
843
+ "step": 139
844
+ },
845
+ {
846
+ "epoch": 1.8,
847
+ "learning_rate": 7.172071559855792e-06,
848
+ "loss": 2.2959,
849
+ "step": 140
850
+ },
851
+ {
852
+ "epoch": 1.81,
853
+ "learning_rate": 7.134863313109847e-06,
854
+ "loss": 2.3131,
855
+ "step": 141
856
+ },
857
+ {
858
+ "epoch": 1.82,
859
+ "learning_rate": 7.097509916241145e-06,
860
+ "loss": 1.7705,
861
+ "step": 142
862
+ },
863
+ {
864
+ "epoch": 1.84,
865
+ "learning_rate": 7.060013908920549e-06,
866
+ "loss": 1.6909,
867
+ "step": 143
868
+ },
869
+ {
870
+ "epoch": 1.85,
871
+ "learning_rate": 7.022377840515047e-06,
872
+ "loss": 1.84,
873
+ "step": 144
874
+ },
875
+ {
876
+ "epoch": 1.86,
877
+ "learning_rate": 6.984604269914437e-06,
878
+ "loss": 1.977,
879
+ "step": 145
880
+ },
881
+ {
882
+ "epoch": 1.87,
883
+ "learning_rate": 6.94669576535734e-06,
884
+ "loss": 2.177,
885
+ "step": 146
886
+ },
887
+ {
888
+ "epoch": 1.89,
889
+ "learning_rate": 6.908654904256584e-06,
890
+ "loss": 1.9606,
891
+ "step": 147
892
+ },
893
+ {
894
+ "epoch": 1.9,
895
+ "learning_rate": 6.870484273023967e-06,
896
+ "loss": 2.0072,
897
+ "step": 148
898
+ },
899
+ {
900
+ "epoch": 1.91,
901
+ "learning_rate": 6.832186466894402e-06,
902
+ "loss": 1.7583,
903
+ "step": 149
904
+ },
905
+ {
906
+ "epoch": 1.93,
907
+ "learning_rate": 6.793764089749473e-06,
908
+ "loss": 1.8085,
909
+ "step": 150
910
+ },
911
+ {
912
+ "epoch": 1.94,
913
+ "learning_rate": 6.755219753940389e-06,
914
+ "loss": 1.7839,
915
+ "step": 151
916
+ },
917
+ {
918
+ "epoch": 1.95,
919
+ "learning_rate": 6.716556080110374e-06,
920
+ "loss": 1.9219,
921
+ "step": 152
922
+ },
923
+ {
924
+ "epoch": 1.96,
925
+ "learning_rate": 6.677775697016484e-06,
926
+ "loss": 2.1836,
927
+ "step": 153
928
+ },
929
+ {
930
+ "epoch": 1.98,
931
+ "learning_rate": 6.638881241350884e-06,
932
+ "loss": 1.964,
933
+ "step": 154
934
+ },
935
+ {
936
+ "epoch": 1.99,
937
+ "learning_rate": 6.599875357561572e-06,
938
+ "loss": 1.8509,
939
+ "step": 155
940
+ },
941
+ {
942
+ "epoch": 2.0,
943
+ "learning_rate": 6.560760697672583e-06,
944
+ "loss": 2.2632,
945
+ "step": 156
946
+ },
947
+ {
948
+ "epoch": 2.02,
949
+ "learning_rate": 6.5215399211036815e-06,
950
+ "loss": 1.9777,
951
+ "step": 157
952
+ },
953
+ {
954
+ "epoch": 2.03,
955
+ "learning_rate": 6.4822156944895375e-06,
956
+ "loss": 1.9851,
957
+ "step": 158
958
+ },
959
+ {
960
+ "epoch": 2.04,
961
+ "learning_rate": 6.442790691498433e-06,
962
+ "loss": 2.204,
963
+ "step": 159
964
+ },
965
+ {
966
+ "epoch": 2.05,
967
+ "learning_rate": 6.403267592650466e-06,
968
+ "loss": 1.7941,
969
+ "step": 160
970
+ },
971
+ {
972
+ "epoch": 2.07,
973
+ "learning_rate": 6.363649085135311e-06,
974
+ "loss": 1.8559,
975
+ "step": 161
976
+ },
977
+ {
978
+ "epoch": 2.08,
979
+ "learning_rate": 6.323937862629513e-06,
980
+ "loss": 1.9673,
981
+ "step": 162
982
+ },
983
+ {
984
+ "epoch": 2.09,
985
+ "learning_rate": 6.2841366251133405e-06,
986
+ "loss": 1.883,
987
+ "step": 163
988
+ },
989
+ {
990
+ "epoch": 2.11,
991
+ "learning_rate": 6.244248078687213e-06,
992
+ "loss": 1.9343,
993
+ "step": 164
994
+ },
995
+ {
996
+ "epoch": 2.12,
997
+ "learning_rate": 6.204274935387716e-06,
998
+ "loss": 1.8082,
999
+ "step": 165
1000
+ },
1001
+ {
1002
+ "epoch": 2.13,
1003
+ "learning_rate": 6.164219913003208e-06,
1004
+ "loss": 1.9898,
1005
+ "step": 166
1006
+ },
1007
+ {
1008
+ "epoch": 2.14,
1009
+ "learning_rate": 6.124085734889034e-06,
1010
+ "loss": 1.6458,
1011
+ "step": 167
1012
+ },
1013
+ {
1014
+ "epoch": 2.16,
1015
+ "learning_rate": 6.083875129782366e-06,
1016
+ "loss": 2.0815,
1017
+ "step": 168
1018
+ },
1019
+ {
1020
+ "epoch": 2.17,
1021
+ "learning_rate": 6.043590831616677e-06,
1022
+ "loss": 2.1997,
1023
+ "step": 169
1024
+ },
1025
+ {
1026
+ "epoch": 2.18,
1027
+ "learning_rate": 6.003235579335851e-06,
1028
+ "loss": 2.1733,
1029
+ "step": 170
1030
+ },
1031
+ {
1032
+ "epoch": 2.2,
1033
+ "learning_rate": 5.962812116707977e-06,
1034
+ "loss": 1.8058,
1035
+ "step": 171
1036
+ },
1037
+ {
1038
+ "epoch": 2.21,
1039
+ "learning_rate": 5.92232319213878e-06,
1040
+ "loss": 2.1662,
1041
+ "step": 172
1042
+ },
1043
+ {
1044
+ "epoch": 2.22,
1045
+ "learning_rate": 5.8817715584847744e-06,
1046
+ "loss": 1.8951,
1047
+ "step": 173
1048
+ },
1049
+ {
1050
+ "epoch": 2.23,
1051
+ "learning_rate": 5.841159972866085e-06,
1052
+ "loss": 1.9074,
1053
+ "step": 174
1054
+ },
1055
+ {
1056
+ "epoch": 2.25,
1057
+ "learning_rate": 5.800491196478989e-06,
1058
+ "loss": 1.9271,
1059
+ "step": 175
1060
+ },
1061
+ {
1062
+ "epoch": 2.26,
1063
+ "learning_rate": 5.759767994408188e-06,
1064
+ "loss": 1.9044,
1065
+ "step": 176
1066
+ },
1067
+ {
1068
+ "epoch": 2.27,
1069
+ "learning_rate": 5.718993135438803e-06,
1070
+ "loss": 1.9798,
1071
+ "step": 177
1072
+ },
1073
+ {
1074
+ "epoch": 2.29,
1075
+ "learning_rate": 5.678169391868128e-06,
1076
+ "loss": 1.9495,
1077
+ "step": 178
1078
+ },
1079
+ {
1080
+ "epoch": 2.3,
1081
+ "learning_rate": 5.637299539317141e-06,
1082
+ "loss": 2.1539,
1083
+ "step": 179
1084
+ },
1085
+ {
1086
+ "epoch": 2.31,
1087
+ "learning_rate": 5.596386356541779e-06,
1088
+ "loss": 2.0293,
1089
+ "step": 180
1090
+ },
1091
+ {
1092
+ "epoch": 2.32,
1093
+ "learning_rate": 5.555432625244024e-06,
1094
+ "loss": 1.8224,
1095
+ "step": 181
1096
+ },
1097
+ {
1098
+ "epoch": 2.34,
1099
+ "learning_rate": 5.51444112988276e-06,
1100
+ "loss": 1.6832,
1101
+ "step": 182
1102
+ },
1103
+ {
1104
+ "epoch": 2.35,
1105
+ "learning_rate": 5.473414657484468e-06,
1106
+ "loss": 2.0847,
1107
+ "step": 183
1108
+ },
1109
+ {
1110
+ "epoch": 2.36,
1111
+ "learning_rate": 5.432355997453729e-06,
1112
+ "loss": 2.0309,
1113
+ "step": 184
1114
+ },
1115
+ {
1116
+ "epoch": 2.38,
1117
+ "learning_rate": 5.391267941383572e-06,
1118
+ "loss": 2.0406,
1119
+ "step": 185
1120
+ },
1121
+ {
1122
+ "epoch": 2.39,
1123
+ "learning_rate": 5.350153282865674e-06,
1124
+ "loss": 1.7257,
1125
+ "step": 186
1126
+ },
1127
+ {
1128
+ "epoch": 2.4,
1129
+ "learning_rate": 5.309014817300422e-06,
1130
+ "loss": 1.8888,
1131
+ "step": 187
1132
+ },
1133
+ {
1134
+ "epoch": 2.41,
1135
+ "learning_rate": 5.26785534170685e-06,
1136
+ "loss": 1.8149,
1137
+ "step": 188
1138
+ },
1139
+ {
1140
+ "epoch": 2.43,
1141
+ "learning_rate": 5.226677654532476e-06,
1142
+ "loss": 2.0397,
1143
+ "step": 189
1144
+ },
1145
+ {
1146
+ "epoch": 2.44,
1147
+ "learning_rate": 5.185484555463026e-06,
1148
+ "loss": 2.1468,
1149
+ "step": 190
1150
+ },
1151
+ {
1152
+ "epoch": 2.45,
1153
+ "learning_rate": 5.1442788452320915e-06,
1154
+ "loss": 2.1732,
1155
+ "step": 191
1156
+ },
1157
+ {
1158
+ "epoch": 2.47,
1159
+ "learning_rate": 5.1030633254306935e-06,
1160
+ "loss": 2.2631,
1161
+ "step": 192
1162
+ },
1163
+ {
1164
+ "epoch": 2.48,
1165
+ "learning_rate": 5.061840798316815e-06,
1166
+ "loss": 1.9009,
1167
+ "step": 193
1168
+ },
1169
+ {
1170
+ "epoch": 2.49,
1171
+ "learning_rate": 5.020614066624868e-06,
1172
+ "loss": 2.2187,
1173
+ "step": 194
1174
+ },
1175
+ {
1176
+ "epoch": 2.5,
1177
+ "learning_rate": 4.979385933375133e-06,
1178
+ "loss": 1.6122,
1179
+ "step": 195
1180
+ },
1181
+ {
1182
+ "epoch": 2.52,
1183
+ "learning_rate": 4.9381592016831856e-06,
1184
+ "loss": 1.5505,
1185
+ "step": 196
1186
+ },
1187
+ {
1188
+ "epoch": 2.53,
1189
+ "learning_rate": 4.896936674569309e-06,
1190
+ "loss": 1.843,
1191
+ "step": 197
1192
+ },
1193
+ {
1194
+ "epoch": 2.54,
1195
+ "learning_rate": 4.85572115476791e-06,
1196
+ "loss": 1.7565,
1197
+ "step": 198
1198
+ },
1199
+ {
1200
+ "epoch": 2.56,
1201
+ "learning_rate": 4.814515444536975e-06,
1202
+ "loss": 1.8699,
1203
+ "step": 199
1204
+ },
1205
+ {
1206
+ "epoch": 2.57,
1207
+ "learning_rate": 4.773322345467525e-06,
1208
+ "loss": 1.9872,
1209
+ "step": 200
1210
+ },
1211
+ {
1212
+ "epoch": 2.58,
1213
+ "learning_rate": 4.732144658293151e-06,
1214
+ "loss": 1.9223,
1215
+ "step": 201
1216
+ },
1217
+ {
1218
+ "epoch": 2.59,
1219
+ "learning_rate": 4.690985182699581e-06,
1220
+ "loss": 1.9225,
1221
+ "step": 202
1222
+ },
1223
+ {
1224
+ "epoch": 2.61,
1225
+ "learning_rate": 4.649846717134327e-06,
1226
+ "loss": 1.8284,
1227
+ "step": 203
1228
+ },
1229
+ {
1230
+ "epoch": 2.62,
1231
+ "learning_rate": 4.6087320586164296e-06,
1232
+ "loss": 1.8652,
1233
+ "step": 204
1234
+ },
1235
+ {
1236
+ "epoch": 2.63,
1237
+ "learning_rate": 4.567644002546273e-06,
1238
+ "loss": 1.735,
1239
+ "step": 205
1240
+ },
1241
+ {
1242
+ "epoch": 2.65,
1243
+ "learning_rate": 4.526585342515533e-06,
1244
+ "loss": 2.0562,
1245
+ "step": 206
1246
+ },
1247
+ {
1248
+ "epoch": 2.66,
1249
+ "learning_rate": 4.485558870117241e-06,
1250
+ "loss": 2.0004,
1251
+ "step": 207
1252
+ },
1253
+ {
1254
+ "epoch": 2.67,
1255
+ "learning_rate": 4.444567374755978e-06,
1256
+ "loss": 2.2094,
1257
+ "step": 208
1258
+ },
1259
+ {
1260
+ "epoch": 2.68,
1261
+ "learning_rate": 4.403613643458222e-06,
1262
+ "loss": 1.9606,
1263
+ "step": 209
1264
+ },
1265
+ {
1266
+ "epoch": 2.7,
1267
+ "learning_rate": 4.362700460682861e-06,
1268
+ "loss": 1.984,
1269
+ "step": 210
1270
+ },
1271
+ {
1272
+ "epoch": 2.71,
1273
+ "learning_rate": 4.321830608131872e-06,
1274
+ "loss": 1.7363,
1275
+ "step": 211
1276
+ },
1277
+ {
1278
+ "epoch": 2.72,
1279
+ "learning_rate": 4.281006864561199e-06,
1280
+ "loss": 1.9783,
1281
+ "step": 212
1282
+ },
1283
+ {
1284
+ "epoch": 2.74,
1285
+ "learning_rate": 4.240232005591816e-06,
1286
+ "loss": 1.6583,
1287
+ "step": 213
1288
+ },
1289
+ {
1290
+ "epoch": 2.75,
1291
+ "learning_rate": 4.1995088035210126e-06,
1292
+ "loss": 2.3748,
1293
+ "step": 214
1294
+ },
1295
+ {
1296
+ "epoch": 2.76,
1297
+ "learning_rate": 4.158840027133917e-06,
1298
+ "loss": 2.0012,
1299
+ "step": 215
1300
+ },
1301
+ {
1302
+ "epoch": 2.77,
1303
+ "learning_rate": 4.1182284415152255e-06,
1304
+ "loss": 1.6849,
1305
+ "step": 216
1306
+ },
1307
+ {
1308
+ "epoch": 2.79,
1309
+ "learning_rate": 4.077676807861221e-06,
1310
+ "loss": 1.729,
1311
+ "step": 217
1312
+ },
1313
+ {
1314
+ "epoch": 2.8,
1315
+ "learning_rate": 4.037187883292027e-06,
1316
+ "loss": 2.2028,
1317
+ "step": 218
1318
+ },
1319
+ {
1320
+ "epoch": 2.81,
1321
+ "learning_rate": 3.996764420664149e-06,
1322
+ "loss": 1.9073,
1323
+ "step": 219
1324
+ },
1325
+ {
1326
+ "epoch": 2.83,
1327
+ "learning_rate": 3.956409168383325e-06,
1328
+ "loss": 1.6662,
1329
+ "step": 220
1330
+ },
1331
+ {
1332
+ "epoch": 2.84,
1333
+ "learning_rate": 3.916124870217635e-06,
1334
+ "loss": 1.9193,
1335
+ "step": 221
1336
+ },
1337
+ {
1338
+ "epoch": 2.85,
1339
+ "learning_rate": 3.875914265110967e-06,
1340
+ "loss": 1.8654,
1341
+ "step": 222
1342
+ },
1343
+ {
1344
+ "epoch": 2.86,
1345
+ "learning_rate": 3.835780086996794e-06,
1346
+ "loss": 1.9015,
1347
+ "step": 223
1348
+ },
1349
+ {
1350
+ "epoch": 2.88,
1351
+ "learning_rate": 3.7957250646122843e-06,
1352
+ "loss": 1.9406,
1353
+ "step": 224
1354
+ },
1355
+ {
1356
+ "epoch": 2.89,
1357
+ "learning_rate": 3.755751921312788e-06,
1358
+ "loss": 1.8809,
1359
+ "step": 225
1360
+ },
1361
+ {
1362
+ "epoch": 2.9,
1363
+ "learning_rate": 3.715863374886661e-06,
1364
+ "loss": 1.8371,
1365
+ "step": 226
1366
+ },
1367
+ {
1368
+ "epoch": 2.91,
1369
+ "learning_rate": 3.6760621373704867e-06,
1370
+ "loss": 2.0746,
1371
+ "step": 227
1372
+ },
1373
+ {
1374
+ "epoch": 2.93,
1375
+ "learning_rate": 3.636350914864689e-06,
1376
+ "loss": 1.822,
1377
+ "step": 228
1378
+ },
1379
+ {
1380
+ "epoch": 2.94,
1381
+ "learning_rate": 3.5967324073495363e-06,
1382
+ "loss": 1.7246,
1383
+ "step": 229
1384
+ },
1385
+ {
1386
+ "epoch": 2.95,
1387
+ "learning_rate": 3.5572093085015683e-06,
1388
+ "loss": 1.6679,
1389
+ "step": 230
1390
+ },
1391
+ {
1392
+ "epoch": 2.97,
1393
+ "learning_rate": 3.5177843055104633e-06,
1394
+ "loss": 1.8069,
1395
+ "step": 231
1396
+ },
1397
+ {
1398
+ "epoch": 2.98,
1399
+ "learning_rate": 3.4784600788963197e-06,
1400
+ "loss": 1.9182,
1401
+ "step": 232
1402
+ },
1403
+ {
1404
+ "epoch": 2.99,
1405
+ "learning_rate": 3.4392393023274173e-06,
1406
+ "loss": 1.6175,
1407
+ "step": 233
1408
+ },
1409
+ {
1410
+ "epoch": 3.0,
1411
+ "learning_rate": 3.4001246424384294e-06,
1412
+ "loss": 1.8909,
1413
+ "step": 234
1414
+ },
1415
+ {
1416
+ "epoch": 3.02,
1417
+ "learning_rate": 3.361118758649116e-06,
1418
+ "loss": 2.0512,
1419
+ "step": 235
1420
+ },
1421
+ {
1422
+ "epoch": 3.03,
1423
+ "learning_rate": 3.322224302983517e-06,
1424
+ "loss": 1.8031,
1425
+ "step": 236
1426
+ },
1427
+ {
1428
+ "epoch": 3.04,
1429
+ "learning_rate": 3.2834439198896285e-06,
1430
+ "loss": 1.8325,
1431
+ "step": 237
1432
+ },
1433
+ {
1434
+ "epoch": 3.06,
1435
+ "learning_rate": 3.2447802460596124e-06,
1436
+ "loss": 1.9987,
1437
+ "step": 238
1438
+ },
1439
+ {
1440
+ "epoch": 3.07,
1441
+ "learning_rate": 3.206235910250529e-06,
1442
+ "loss": 1.694,
1443
+ "step": 239
1444
+ },
1445
+ {
1446
+ "epoch": 3.08,
1447
+ "learning_rate": 3.167813533105598e-06,
1448
+ "loss": 2.1121,
1449
+ "step": 240
1450
+ },
1451
+ {
1452
+ "epoch": 3.09,
1453
+ "learning_rate": 3.1295157269760347e-06,
1454
+ "loss": 2.1676,
1455
+ "step": 241
1456
+ },
1457
+ {
1458
+ "epoch": 3.11,
1459
+ "learning_rate": 3.0913450957434177e-06,
1460
+ "loss": 1.9139,
1461
+ "step": 242
1462
+ },
1463
+ {
1464
+ "epoch": 3.12,
1465
+ "learning_rate": 3.0533042346426612e-06,
1466
+ "loss": 1.8614,
1467
+ "step": 243
1468
+ },
1469
+ {
1470
+ "epoch": 3.13,
1471
+ "learning_rate": 3.015395730085565e-06,
1472
+ "loss": 1.9445,
1473
+ "step": 244
1474
+ },
1475
+ {
1476
+ "epoch": 3.15,
1477
+ "learning_rate": 2.9776221594849565e-06,
1478
+ "loss": 2.1084,
1479
+ "step": 245
1480
+ },
1481
+ {
1482
+ "epoch": 3.16,
1483
+ "learning_rate": 2.9399860910794532e-06,
1484
+ "loss": 1.9299,
1485
+ "step": 246
1486
+ },
1487
+ {
1488
+ "epoch": 3.17,
1489
+ "learning_rate": 2.902490083758856e-06,
1490
+ "loss": 1.7569,
1491
+ "step": 247
1492
+ },
1493
+ {
1494
+ "epoch": 3.18,
1495
+ "learning_rate": 2.8651366868901543e-06,
1496
+ "loss": 1.5915,
1497
+ "step": 248
1498
+ },
1499
+ {
1500
+ "epoch": 3.2,
1501
+ "learning_rate": 2.8279284401442085e-06,
1502
+ "loss": 1.4711,
1503
+ "step": 249
1504
+ },
1505
+ {
1506
+ "epoch": 3.21,
1507
+ "learning_rate": 2.790867873323067e-06,
1508
+ "loss": 1.7658,
1509
+ "step": 250
1510
+ },
1511
+ {
1512
+ "epoch": 3.22,
1513
+ "learning_rate": 2.753957506187964e-06,
1514
+ "loss": 2.2383,
1515
+ "step": 251
1516
+ },
1517
+ {
1518
+ "epoch": 3.24,
1519
+ "learning_rate": 2.7171998482880093e-06,
1520
+ "loss": 2.1103,
1521
+ "step": 252
1522
+ },
1523
+ {
1524
+ "epoch": 3.25,
1525
+ "learning_rate": 2.680597398789554e-06,
1526
+ "loss": 1.9208,
1527
+ "step": 253
1528
+ },
1529
+ {
1530
+ "epoch": 3.26,
1531
+ "learning_rate": 2.6441526463062727e-06,
1532
+ "loss": 1.9773,
1533
+ "step": 254
1534
+ },
1535
+ {
1536
+ "epoch": 3.27,
1537
+ "learning_rate": 2.607868068729966e-06,
1538
+ "loss": 2.0155,
1539
+ "step": 255
1540
+ },
1541
+ {
1542
+ "epoch": 3.29,
1543
+ "learning_rate": 2.571746133062082e-06,
1544
+ "loss": 1.8626,
1545
+ "step": 256
1546
+ },
1547
+ {
1548
+ "epoch": 3.3,
1549
+ "learning_rate": 2.5357892952459917e-06,
1550
+ "loss": 1.764,
1551
+ "step": 257
1552
+ },
1553
+ {
1554
+ "epoch": 3.31,
1555
+ "learning_rate": 2.5000000000000015e-06,
1556
+ "loss": 2.0152,
1557
+ "step": 258
1558
+ },
1559
+ {
1560
+ "epoch": 3.33,
1561
+ "learning_rate": 2.4643806806511344e-06,
1562
+ "loss": 1.8742,
1563
+ "step": 259
1564
+ },
1565
+ {
1566
+ "epoch": 3.34,
1567
+ "learning_rate": 2.4289337589697e-06,
1568
+ "loss": 2.0988,
1569
+ "step": 260
1570
+ },
1571
+ {
1572
+ "epoch": 3.35,
1573
+ "learning_rate": 2.3936616450046207e-06,
1574
+ "loss": 1.9274,
1575
+ "step": 261
1576
+ },
1577
+ {
1578
+ "epoch": 3.36,
1579
+ "learning_rate": 2.3585667369195815e-06,
1580
+ "loss": 1.5094,
1581
+ "step": 262
1582
+ },
1583
+ {
1584
+ "epoch": 3.38,
1585
+ "learning_rate": 2.32365142082998e-06,
1586
+ "loss": 2.0823,
1587
+ "step": 263
1588
+ },
1589
+ {
1590
+ "epoch": 3.39,
1591
+ "learning_rate": 2.288918070640684e-06,
1592
+ "loss": 1.6446,
1593
+ "step": 264
1594
+ },
1595
+ {
1596
+ "epoch": 3.4,
1597
+ "learning_rate": 2.254369047884639e-06,
1598
+ "loss": 1.9535,
1599
+ "step": 265
1600
+ },
1601
+ {
1602
+ "epoch": 3.42,
1603
+ "learning_rate": 2.2200067015622986e-06,
1604
+ "loss": 1.5839,
1605
+ "step": 266
1606
+ },
1607
+ {
1608
+ "epoch": 3.43,
1609
+ "learning_rate": 2.185833367981918e-06,
1610
+ "loss": 1.7815,
1611
+ "step": 267
1612
+ },
1613
+ {
1614
+ "epoch": 3.44,
1615
+ "learning_rate": 2.1518513706007154e-06,
1616
+ "loss": 1.4904,
1617
+ "step": 268
1618
+ },
1619
+ {
1620
+ "epoch": 3.45,
1621
+ "learning_rate": 2.118063019866884e-06,
1622
+ "loss": 2.2307,
1623
+ "step": 269
1624
+ },
1625
+ {
1626
+ "epoch": 3.47,
1627
+ "learning_rate": 2.0844706130625146e-06,
1628
+ "loss": 1.947,
1629
+ "step": 270
1630
+ },
1631
+ {
1632
+ "epoch": 3.48,
1633
+ "learning_rate": 2.0510764341474032e-06,
1634
+ "loss": 2.2949,
1635
+ "step": 271
1636
+ },
1637
+ {
1638
+ "epoch": 3.49,
1639
+ "learning_rate": 2.0178827536037547e-06,
1640
+ "loss": 1.7868,
1641
+ "step": 272
1642
+ },
1643
+ {
1644
+ "epoch": 3.51,
1645
+ "learning_rate": 1.9848918282818242e-06,
1646
+ "loss": 1.663,
1647
+ "step": 273
1648
+ },
1649
+ {
1650
+ "epoch": 3.52,
1651
+ "learning_rate": 1.952105901246461e-06,
1652
+ "loss": 1.7499,
1653
+ "step": 274
1654
+ },
1655
+ {
1656
+ "epoch": 3.53,
1657
+ "learning_rate": 1.9195272016246105e-06,
1658
+ "loss": 1.938,
1659
+ "step": 275
1660
+ },
1661
+ {
1662
+ "epoch": 3.54,
1663
+ "learning_rate": 1.887157944453749e-06,
1664
+ "loss": 2.0385,
1665
+ "step": 276
1666
+ },
1667
+ {
1668
+ "epoch": 3.56,
1669
+ "learning_rate": 1.855000330531289e-06,
1670
+ "loss": 2.0374,
1671
+ "step": 277
1672
+ },
1673
+ {
1674
+ "epoch": 3.57,
1675
+ "learning_rate": 1.823056546264939e-06,
1676
+ "loss": 1.7599,
1677
+ "step": 278
1678
+ },
1679
+ {
1680
+ "epoch": 3.58,
1681
+ "learning_rate": 1.7913287635240573e-06,
1682
+ "loss": 2.1512,
1683
+ "step": 279
1684
+ },
1685
+ {
1686
+ "epoch": 3.6,
1687
+ "learning_rate": 1.7598191394919738e-06,
1688
+ "loss": 1.8319,
1689
+ "step": 280
1690
+ },
1691
+ {
1692
+ "epoch": 3.61,
1693
+ "learning_rate": 1.7285298165193388e-06,
1694
+ "loss": 1.7238,
1695
+ "step": 281
1696
+ },
1697
+ {
1698
+ "epoch": 3.62,
1699
+ "learning_rate": 1.697462921978446e-06,
1700
+ "loss": 1.8568,
1701
+ "step": 282
1702
+ },
1703
+ {
1704
+ "epoch": 3.63,
1705
+ "learning_rate": 1.6666205681186032e-06,
1706
+ "loss": 2.1115,
1707
+ "step": 283
1708
+ },
1709
+ {
1710
+ "epoch": 3.65,
1711
+ "learning_rate": 1.6360048519225197e-06,
1712
+ "loss": 1.8881,
1713
+ "step": 284
1714
+ },
1715
+ {
1716
+ "epoch": 3.66,
1717
+ "learning_rate": 1.6056178549637248e-06,
1718
+ "loss": 1.8789,
1719
+ "step": 285
1720
+ },
1721
+ {
1722
+ "epoch": 3.67,
1723
+ "learning_rate": 1.5754616432650443e-06,
1724
+ "loss": 2.168,
1725
+ "step": 286
1726
+ },
1727
+ {
1728
+ "epoch": 3.69,
1729
+ "learning_rate": 1.5455382671581365e-06,
1730
+ "loss": 1.9499,
1731
+ "step": 287
1732
+ },
1733
+ {
1734
+ "epoch": 3.7,
1735
+ "learning_rate": 1.5158497611440792e-06,
1736
+ "loss": 2.0706,
1737
+ "step": 288
1738
+ },
1739
+ {
1740
+ "epoch": 3.71,
1741
+ "learning_rate": 1.48639814375505e-06,
1742
+ "loss": 1.9817,
1743
+ "step": 289
1744
+ },
1745
+ {
1746
+ "epoch": 3.72,
1747
+ "learning_rate": 1.4571854174170847e-06,
1748
+ "loss": 1.8163,
1749
+ "step": 290
1750
+ },
1751
+ {
1752
+ "epoch": 3.74,
1753
+ "learning_rate": 1.428213568313927e-06,
1754
+ "loss": 1.785,
1755
+ "step": 291
1756
+ },
1757
+ {
1758
+ "epoch": 3.75,
1759
+ "learning_rate": 1.3994845662519985e-06,
1760
+ "loss": 2.0604,
1761
+ "step": 292
1762
+ },
1763
+ {
1764
+ "epoch": 3.76,
1765
+ "learning_rate": 1.3710003645264559e-06,
1766
+ "loss": 1.7002,
1767
+ "step": 293
1768
+ },
1769
+ {
1770
+ "epoch": 3.78,
1771
+ "learning_rate": 1.3427628997883957e-06,
1772
+ "loss": 1.788,
1773
+ "step": 294
1774
+ },
1775
+ {
1776
+ "epoch": 3.79,
1777
+ "learning_rate": 1.3147740919131814e-06,
1778
+ "loss": 1.5869,
1779
+ "step": 295
1780
+ },
1781
+ {
1782
+ "epoch": 3.8,
1783
+ "learning_rate": 1.2870358438699005e-06,
1784
+ "loss": 1.6432,
1785
+ "step": 296
1786
+ },
1787
+ {
1788
+ "epoch": 3.81,
1789
+ "learning_rate": 1.2595500415919948e-06,
1790
+ "loss": 1.9052,
1791
+ "step": 297
1792
+ },
1793
+ {
1794
+ "epoch": 3.83,
1795
+ "learning_rate": 1.232318553849023e-06,
1796
+ "loss": 1.8955,
1797
+ "step": 298
1798
+ },
1799
+ {
1800
+ "epoch": 3.84,
1801
+ "learning_rate": 1.2053432321196085e-06,
1802
+ "loss": 1.904,
1803
+ "step": 299
1804
+ },
1805
+ {
1806
+ "epoch": 3.85,
1807
+ "learning_rate": 1.1786259104655562e-06,
1808
+ "loss": 2.0135,
1809
+ "step": 300
1810
+ },
1811
+ {
1812
+ "epoch": 3.87,
1813
+ "learning_rate": 1.1521684054071524e-06,
1814
+ "loss": 1.8603,
1815
+ "step": 301
1816
+ },
1817
+ {
1818
+ "epoch": 3.88,
1819
+ "learning_rate": 1.1259725157996593e-06,
1820
+ "loss": 1.7627,
1821
+ "step": 302
1822
+ },
1823
+ {
1824
+ "epoch": 3.89,
1825
+ "learning_rate": 1.1000400227110142e-06,
1826
+ "loss": 1.6693,
1827
+ "step": 303
1828
+ },
1829
+ {
1830
+ "epoch": 3.9,
1831
+ "learning_rate": 1.0743726893007257e-06,
1832
+ "loss": 1.779,
1833
+ "step": 304
1834
+ },
1835
+ {
1836
+ "epoch": 3.92,
1837
+ "learning_rate": 1.0489722607000052e-06,
1838
+ "loss": 2.0575,
1839
+ "step": 305
1840
+ },
1841
+ {
1842
+ "epoch": 3.93,
1843
+ "learning_rate": 1.0238404638931077e-06,
1844
+ "loss": 1.9037,
1845
+ "step": 306
1846
+ },
1847
+ {
1848
+ "epoch": 3.94,
1849
+ "learning_rate": 9.989790075999145e-07,
1850
+ "loss": 1.9991,
1851
+ "step": 307
1852
+ },
1853
+ {
1854
+ "epoch": 3.96,
1855
+ "learning_rate": 9.743895821597638e-07,
1856
+ "loss": 2.0821,
1857
+ "step": 308
1858
+ },
1859
+ {
1860
+ "epoch": 3.97,
1861
+ "learning_rate": 9.500738594165132e-07,
1862
+ "loss": 1.9613,
1863
+ "step": 309
1864
+ },
1865
+ {
1866
+ "epoch": 3.98,
1867
+ "learning_rate": 9.260334926048787e-07,
1868
+ "loss": 1.939,
1869
+ "step": 310
1870
+ },
1871
+ {
1872
+ "epoch": 3.99,
1873
+ "learning_rate": 9.022701162380259e-07,
1874
+ "loss": 1.7692,
1875
+ "step": 311
1876
+ },
1877
+ {
1878
+ "epoch": 4.01,
1879
+ "learning_rate": 8.787853459964407e-07,
1880
+ "loss": 2.0053,
1881
+ "step": 312
1882
+ },
1883
+ {
1884
+ "epoch": 4.02,
1885
+ "learning_rate": 8.555807786180814e-07,
1886
+ "loss": 1.8909,
1887
+ "step": 313
1888
+ },
1889
+ {
1890
+ "epoch": 4.03,
1891
+ "learning_rate": 8.326579917898098e-07,
1892
+ "loss": 1.8692,
1893
+ "step": 314
1894
+ },
1895
+ {
1896
+ "epoch": 4.04,
1897
+ "learning_rate": 8.100185440401276e-07,
1898
+ "loss": 2.1948,
1899
+ "step": 315
1900
+ },
1901
+ {
1902
+ "epoch": 4.06,
1903
+ "learning_rate": 7.876639746332132e-07,
1904
+ "loss": 2.2319,
1905
+ "step": 316
1906
+ },
1907
+ {
1908
+ "epoch": 4.07,
1909
+ "learning_rate": 7.655958034642619e-07,
1910
+ "loss": 1.8725,
1911
+ "step": 317
1912
+ },
1913
+ {
1914
+ "epoch": 4.08,
1915
+ "learning_rate": 7.43815530956149e-07,
1916
+ "loss": 1.5717,
1917
+ "step": 318
1918
+ },
1919
+ {
1920
+ "epoch": 4.1,
1921
+ "learning_rate": 7.223246379574206e-07,
1922
+ "loss": 1.7549,
1923
+ "step": 319
1924
+ },
1925
+ {
1926
+ "epoch": 4.11,
1927
+ "learning_rate": 7.011245856416016e-07,
1928
+ "loss": 2.0717,
1929
+ "step": 320
1930
+ },
1931
+ {
1932
+ "epoch": 4.12,
1933
+ "learning_rate": 6.802168154078586e-07,
1934
+ "loss": 1.9039,
1935
+ "step": 321
1936
+ },
1937
+ {
1938
+ "epoch": 4.13,
1939
+ "learning_rate": 6.596027487829915e-07,
1940
+ "loss": 2.0704,
1941
+ "step": 322
1942
+ },
1943
+ {
1944
+ "epoch": 4.15,
1945
+ "learning_rate": 6.392837873247876e-07,
1946
+ "loss": 1.8727,
1947
+ "step": 323
1948
+ },
1949
+ {
1950
+ "epoch": 4.16,
1951
+ "learning_rate": 6.192613125267283e-07,
1952
+ "loss": 1.8494,
1953
+ "step": 324
1954
+ },
1955
+ {
1956
+ "epoch": 4.17,
1957
+ "learning_rate": 5.995366857240592e-07,
1958
+ "loss": 1.9835,
1959
+ "step": 325
1960
+ },
1961
+ {
1962
+ "epoch": 4.19,
1963
+ "learning_rate": 5.801112480012344e-07,
1964
+ "loss": 2.0905,
1965
+ "step": 326
1966
+ },
1967
+ {
1968
+ "epoch": 4.2,
1969
+ "learning_rate": 5.609863201007382e-07,
1970
+ "loss": 1.8491,
1971
+ "step": 327
1972
+ },
1973
+ {
1974
+ "epoch": 4.21,
1975
+ "learning_rate": 5.421632023332779e-07,
1976
+ "loss": 1.6643,
1977
+ "step": 328
1978
+ },
1979
+ {
1980
+ "epoch": 4.22,
1981
+ "learning_rate": 5.236431744893883e-07,
1982
+ "loss": 1.5409,
1983
+ "step": 329
1984
+ },
1985
+ {
1986
+ "epoch": 4.24,
1987
+ "learning_rate": 5.054274957524075e-07,
1988
+ "loss": 1.6956,
1989
+ "step": 330
1990
+ },
1991
+ {
1992
+ "epoch": 4.25,
1993
+ "learning_rate": 4.875174046128684e-07,
1994
+ "loss": 1.9333,
1995
+ "step": 331
1996
+ },
1997
+ {
1998
+ "epoch": 4.26,
1999
+ "learning_rate": 4.6991411878429593e-07,
2000
+ "loss": 1.7577,
2001
+ "step": 332
2002
+ },
2003
+ {
2004
+ "epoch": 4.28,
2005
+ "learning_rate": 4.526188351204103e-07,
2006
+ "loss": 2.0727,
2007
+ "step": 333
2008
+ },
2009
+ {
2010
+ "epoch": 4.29,
2011
+ "learning_rate": 4.3563272953375426e-07,
2012
+ "loss": 1.8069,
2013
+ "step": 334
2014
+ },
2015
+ {
2016
+ "epoch": 4.3,
2017
+ "learning_rate": 4.1895695691574146e-07,
2018
+ "loss": 2.0435,
2019
+ "step": 335
2020
+ },
2021
+ {
2022
+ "epoch": 4.31,
2023
+ "learning_rate": 4.025926510581357e-07,
2024
+ "loss": 2.0467,
2025
+ "step": 336
2026
+ },
2027
+ {
2028
+ "epoch": 4.33,
2029
+ "learning_rate": 3.8654092457596714e-07,
2030
+ "loss": 1.6247,
2031
+ "step": 337
2032
+ },
2033
+ {
2034
+ "epoch": 4.34,
2035
+ "learning_rate": 3.7080286883187713e-07,
2036
+ "loss": 1.9954,
2037
+ "step": 338
2038
+ },
2039
+ {
2040
+ "epoch": 4.35,
2041
+ "learning_rate": 3.553795538619237e-07,
2042
+ "loss": 2.255,
2043
+ "step": 339
2044
+ },
2045
+ {
2046
+ "epoch": 4.37,
2047
+ "learning_rate": 3.402720283028277e-07,
2048
+ "loss": 1.9724,
2049
+ "step": 340
2050
+ },
2051
+ {
2052
+ "epoch": 4.38,
2053
+ "learning_rate": 3.2548131932067184e-07,
2054
+ "loss": 1.9318,
2055
+ "step": 341
2056
+ },
2057
+ {
2058
+ "epoch": 4.39,
2059
+ "learning_rate": 3.110084325410667e-07,
2060
+ "loss": 1.7369,
2061
+ "step": 342
2062
+ },
2063
+ {
2064
+ "epoch": 4.4,
2065
+ "learning_rate": 2.9685435198078095e-07,
2066
+ "loss": 1.7993,
2067
+ "step": 343
2068
+ },
2069
+ {
2070
+ "epoch": 4.42,
2071
+ "learning_rate": 2.830200399808286e-07,
2072
+ "loss": 1.9586,
2073
+ "step": 344
2074
+ },
2075
+ {
2076
+ "epoch": 4.43,
2077
+ "learning_rate": 2.6950643714104774e-07,
2078
+ "loss": 1.6359,
2079
+ "step": 345
2080
+ },
2081
+ {
2082
+ "epoch": 4.44,
2083
+ "learning_rate": 2.563144622561453e-07,
2084
+ "loss": 1.8904,
2085
+ "step": 346
2086
+ },
2087
+ {
2088
+ "epoch": 4.46,
2089
+ "learning_rate": 2.4344501225322557e-07,
2090
+ "loss": 1.7661,
2091
+ "step": 347
2092
+ },
2093
+ {
2094
+ "epoch": 4.47,
2095
+ "learning_rate": 2.3089896213081553e-07,
2096
+ "loss": 1.8997,
2097
+ "step": 348
2098
+ },
2099
+ {
2100
+ "epoch": 4.48,
2101
+ "learning_rate": 2.1867716489936297e-07,
2102
+ "loss": 1.759,
2103
+ "step": 349
2104
+ },
2105
+ {
2106
+ "epoch": 4.49,
2107
+ "learning_rate": 2.0678045152324798e-07,
2108
+ "loss": 1.7521,
2109
+ "step": 350
2110
+ },
2111
+ {
2112
+ "epoch": 4.51,
2113
+ "learning_rate": 1.9520963086428258e-07,
2114
+ "loss": 1.5358,
2115
+ "step": 351
2116
+ },
2117
+ {
2118
+ "epoch": 4.52,
2119
+ "learning_rate": 1.8396548962671456e-07,
2120
+ "loss": 2.0727,
2121
+ "step": 352
2122
+ },
2123
+ {
2124
+ "epoch": 4.53,
2125
+ "learning_rate": 1.7304879230374328e-07,
2126
+ "loss": 2.0473,
2127
+ "step": 353
2128
+ },
2129
+ {
2130
+ "epoch": 4.55,
2131
+ "learning_rate": 1.6246028112553603e-07,
2132
+ "loss": 1.9068,
2133
+ "step": 354
2134
+ },
2135
+ {
2136
+ "epoch": 4.56,
2137
+ "learning_rate": 1.5220067600876686e-07,
2138
+ "loss": 2.1523,
2139
+ "step": 355
2140
+ },
2141
+ {
2142
+ "epoch": 4.57,
2143
+ "learning_rate": 1.422706745076713e-07,
2144
+ "loss": 1.9869,
2145
+ "step": 356
2146
+ },
2147
+ {
2148
+ "epoch": 4.58,
2149
+ "learning_rate": 1.3267095176661304e-07,
2150
+ "loss": 2.0737,
2151
+ "step": 357
2152
+ },
2153
+ {
2154
+ "epoch": 4.6,
2155
+ "learning_rate": 1.2340216047418697e-07,
2156
+ "loss": 1.9659,
2157
+ "step": 358
2158
+ },
2159
+ {
2160
+ "epoch": 4.61,
2161
+ "learning_rate": 1.1446493081883891e-07,
2162
+ "loss": 1.4132,
2163
+ "step": 359
2164
+ },
2165
+ {
2166
+ "epoch": 4.62,
2167
+ "learning_rate": 1.0585987044602009e-07,
2168
+ "loss": 1.7613,
2169
+ "step": 360
2170
+ },
2171
+ {
2172
+ "epoch": 4.64,
2173
+ "learning_rate": 9.758756441687333e-08,
2174
+ "loss": 2.0321,
2175
+ "step": 361
2176
+ },
2177
+ {
2178
+ "epoch": 4.65,
2179
+ "learning_rate": 8.964857516845449e-08,
2180
+ "loss": 2.1659,
2181
+ "step": 362
2182
+ },
2183
+ {
2184
+ "epoch": 4.66,
2185
+ "learning_rate": 8.204344247549067e-08,
2186
+ "loss": 1.7783,
2187
+ "step": 363
2188
+ },
2189
+ {
2190
+ "epoch": 4.67,
2191
+ "learning_rate": 7.47726834136836e-08,
2192
+ "loss": 1.8563,
2193
+ "step": 364
2194
+ },
2195
+ {
2196
+ "epoch": 4.69,
2197
+ "learning_rate": 6.783679232455043e-08,
2198
+ "loss": 1.986,
2199
+ "step": 365
2200
+ },
2201
+ {
2202
+ "epoch": 4.7,
2203
+ "learning_rate": 6.123624078181512e-08,
2204
+ "loss": 1.8735,
2205
+ "step": 366
2206
+ },
2207
+ {
2208
+ "epoch": 4.71,
2209
+ "learning_rate": 5.4971477559346286e-08,
2210
+ "loss": 1.5537,
2211
+ "step": 367
2212
+ },
2213
+ {
2214
+ "epoch": 4.73,
2215
+ "learning_rate": 4.90429286006433e-08,
2216
+ "loss": 1.857,
2217
+ "step": 368
2218
+ },
2219
+ {
2220
+ "epoch": 4.74,
2221
+ "learning_rate": 4.34509969898772e-08,
2222
+ "loss": 1.673,
2223
+ "step": 369
2224
+ },
2225
+ {
2226
+ "epoch": 4.75,
2227
+ "learning_rate": 3.819606292448541e-08,
2228
+ "loss": 1.9837,
2229
+ "step": 370
2230
+ },
2231
+ {
2232
+ "epoch": 4.76,
2233
+ "learning_rate": 3.327848368931907e-08,
2234
+ "loss": 1.6531,
2235
+ "step": 371
2236
+ },
2237
+ {
2238
+ "epoch": 4.78,
2239
+ "learning_rate": 2.8698593632357496e-08,
2240
+ "loss": 1.9188,
2241
+ "step": 372
2242
+ },
2243
+ {
2244
+ "epoch": 4.79,
2245
+ "learning_rate": 2.4456704141967437e-08,
2246
+ "loss": 1.9565,
2247
+ "step": 373
2248
+ },
2249
+ {
2250
+ "epoch": 4.8,
2251
+ "learning_rate": 2.0553103625737813e-08,
2252
+ "loss": 1.6245,
2253
+ "step": 374
2254
+ },
2255
+ {
2256
+ "epoch": 4.82,
2257
+ "learning_rate": 1.6988057490868736e-08,
2258
+ "loss": 1.7511,
2259
+ "step": 375
2260
+ },
2261
+ {
2262
+ "epoch": 4.83,
2263
+ "learning_rate": 1.3761808126126486e-08,
2264
+ "loss": 1.8122,
2265
+ "step": 376
2266
+ },
2267
+ {
2268
+ "epoch": 4.84,
2269
+ "learning_rate": 1.0874574885362809e-08,
2270
+ "loss": 1.6269,
2271
+ "step": 377
2272
+ },
2273
+ {
2274
+ "epoch": 4.85,
2275
+ "learning_rate": 8.32655407260241e-09,
2276
+ "loss": 2.0738,
2277
+ "step": 378
2278
+ },
2279
+ {
2280
+ "epoch": 4.87,
2281
+ "learning_rate": 6.117918928693623e-09,
2282
+ "loss": 2.0557,
2283
+ "step": 379
2284
+ },
2285
+ {
2286
+ "epoch": 4.88,
2287
+ "learning_rate": 4.248819619533384e-09,
2288
+ "loss": 1.8619,
2289
+ "step": 380
2290
+ },
2291
+ {
2292
+ "epoch": 4.89,
2293
+ "learning_rate": 2.7193832258537447e-09,
2294
+ "loss": 2.2485,
2295
+ "step": 381
2296
+ },
2297
+ {
2298
+ "epoch": 4.91,
2299
+ "learning_rate": 1.5297137345843261e-09,
2300
+ "loss": 1.9175,
2301
+ "step": 382
2302
+ },
2303
+ {
2304
+ "epoch": 4.92,
2305
+ "learning_rate": 6.798920317807601e-10,
2306
+ "loss": 2.099,
2307
+ "step": 383
2308
+ },
2309
+ {
2310
+ "epoch": 4.93,
2311
+ "learning_rate": 1.6997589712575145e-10,
2312
+ "loss": 2.1961,
2313
+ "step": 384
2314
+ },
2315
+ {
2316
+ "epoch": 4.94,
2317
+ "learning_rate": 0.0,
2318
+ "loss": 2.2319,
2319
+ "step": 385
2320
+ },
2321
+ {
2322
+ "epoch": 4.94,
2323
+ "step": 385,
2324
+ "total_flos": 3.4617941689368576e+17,
2325
+ "train_loss": 2.0085288332654283,
2326
+ "train_runtime": 6983.5696,
2327
+ "train_samples_per_second": 0.446,
2328
+ "train_steps_per_second": 0.055
2329
+ }
2330
+ ],
2331
+ "logging_steps": 1.0,
2332
+ "max_steps": 385,
2333
+ "num_train_epochs": 5,
2334
+ "save_steps": 1000,
2335
+ "total_flos": 3.4617941689368576e+17,
2336
+ "trial_name": null,
2337
+ "trial_params": null
2338
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb47645631bf22babe12672e4f1d9372bee531bf4ede1c212d08b9c34cd82830
3
+ size 4536