jonathanagustin commited on
Commit
eeefe59
1 Parent(s): 0423839

Model save

Browse files
Files changed (5) hide show
  1. README.md +58 -261
  2. config.json +1 -1
  3. metrics.json +6 -6
  4. tokenizer.json +3 -5
  5. trainer_state.json +156 -16
README.md CHANGED
@@ -1,281 +1,78 @@
1
  ---
2
- language: en
3
- license: mit
4
- model_details: "\n ## Abstract\n This model, 'distilbert-finetuned-uncased',\
5
- \ is a question-answering chatbot trained on the SQuAD dataset, demonstrating competency\
6
- \ in building conversational AI using recent advances in natural language processing.\
7
- \ It utilizes a BERT model fine-tuned for extractive question answering.\n\n \
8
- \ ## Data Collection and Preprocessing\n The model was trained on the\
9
- \ Stanford Question Answering Dataset (SQuAD), which contains over 100,000 question-answer\
10
- \ pairs based on Wikipedia articles. The data preprocessing involved tokenizing\
11
- \ context paragraphs and questions, truncating sequences to fit BERT's max length,\
12
- \ and adding special tokens to mark question and paragraph segments.\n\n \
13
- \ ## Model Architecture and Training\n The architecture is based on the BERT\
14
- \ transformer model, which was pretrained on large unlabeled text corpora. For this\
15
- \ project, the BERT base model was fine-tuned on SQuAD for extractive question answering,\
16
- \ with additional output layers for predicting the start and end indices of the\
17
- \ answer span.\n\n ## SQuAD 2.0 Dataset\n SQuAD 2.0 combines the existing\
18
- \ SQuAD data with over 50,000 unanswerable questions written adversarially by crowdworkers\
19
- \ to look similar to answerable ones. This version of the dataset challenges models\
20
- \ to not only produce answers when possible but also determine when no answer is\
21
- \ supported by the paragraph and abstain from answering.\n "
22
- intended_use: "\n - Answering questions from the squad_v2 dataset.\n \
23
- \ - Developing question-answering systems within the scope of the aai520-project.\n\
24
- \ - Research and experimentation in the NLP question-answering domain.\n\
25
- \ "
26
- limitations_and_bias: "\n The model inherits limitations and biases from the\
27
- \ 'distilbert-base-uncased' model, as it was trained on the same foundational data.\
28
- \ \n It may underperform on questions that are ambiguous or too far outside\
29
- \ the scope of the topics covered in the squad_v2 dataset. \n Additionally,\
30
- \ the model may reflect societal biases present in its training data.\n "
31
- ethical_considerations: "\n This model should not be used for making critical\
32
- \ decisions without human oversight, \n as it can generate incorrect or biased\
33
- \ answers, especially for topics not covered in the training data. \n Users\
34
- \ should also consider the ethical implications of using AI in decision-making processes\
35
- \ and the potential for perpetuating biases.\n "
36
- evaluation: "\n The model was evaluated on the squad_v2 dataset using various\
37
- \ metrics. These metrics, along with their corresponding scores, \n are detailed\
38
- \ in the 'eval_results' section. The evaluation process ensured a comprehensive\
39
- \ assessment of the model's performance \n in question-answering scenarios.\n\
40
- \ "
41
- training: "\n The model was trained over 4 epochs with a learning rate of 2e-05,\
42
- \ using a batch size of 64. \n The training utilized a cross-entropy loss\
43
- \ function and the AdamW optimizer, with gradient accumulation over 4 steps.\n \
44
- \ "
45
- tips_and_tricks: "\n For optimal performance, questions should be clear, concise,\
46
- \ and grammatically correct. \n The model performs best on questions related\
47
- \ to topics covered in the squad_v2 dataset. \n It is advisable to pre-process\
48
- \ text for consistency in encoding and punctuation, and to manage expectations for\
49
- \ questions on topics outside the training data.\n "
50
  model-index:
51
- - name: distilbert-finetuned-uncased
52
- results:
53
- - task:
54
- type: question-answering
55
- dataset:
56
- name: SQuAD v2
57
- type: squad_v2
58
- metrics:
59
- - type: Exact
60
- value: 24.74522024762065
61
- - type: F1
62
- value: 28.46868820308392
63
- - type: Total
64
- value: 11873
65
- - type: Hasans Exact
66
- value: 42.39203778677463
67
- - type: Hasans F1
68
- value: 49.8496516591119
69
- - type: Hasans Total
70
- value: 5928
71
- - type: Noans Exact
72
- value: 7.1488645920941964
73
- - type: Noans F1
74
- value: 7.1488645920941964
75
- - type: Noans Total
76
- value: 5945
77
- - type: Best Exact
78
- value: 50.11370336056599
79
- - type: Best Exact Thresh
80
- value: 0.0
81
- - type: Best F1
82
- value: 50.11370336056599
83
- - type: Best F1 Thresh
84
- value: 0.0
85
  ---
86
 
87
- # Model Card for Model ID
 
88
 
89
- <!-- Provide a quick summary of what the model is/does. -->
90
 
 
 
 
91
 
 
92
 
93
- ## Model Details
94
 
95
- ### Model Description
96
 
97
- <!-- Provide a longer summary of what this model is. -->
98
 
 
99
 
 
100
 
101
- - **Developed by:** [More Information Needed]
102
- - **Shared by [optional]:** [More Information Needed]
103
- - **Model type:** [More Information Needed]
104
- - **Language(s) (NLP):** en
105
- - **License:** mit
106
- - **Finetuned from model [optional]:** [More Information Needed]
107
 
108
- ### Model Sources [optional]
109
 
110
- <!-- Provide the basic links for the model. -->
 
 
 
 
 
 
 
 
 
111
 
112
- - **Repository:** [More Information Needed]
113
- - **Paper [optional]:** [More Information Needed]
114
- - **Demo [optional]:** [More Information Needed]
115
 
116
- ## Uses
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
 
118
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
119
-
120
- ### Direct Use
121
-
122
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
123
-
124
- [More Information Needed]
125
-
126
- ### Downstream Use [optional]
127
-
128
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
129
-
130
- [More Information Needed]
131
-
132
- ### Out-of-Scope Use
133
-
134
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
135
-
136
- [More Information Needed]
137
-
138
- ## Bias, Risks, and Limitations
139
-
140
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
141
-
142
- [More Information Needed]
143
-
144
- ### Recommendations
145
-
146
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
147
-
148
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
149
-
150
- ## How to Get Started with the Model
151
-
152
- Use the code below to get started with the model.
153
-
154
- [More Information Needed]
155
-
156
- ## Training Details
157
-
158
- ### Training Data
159
-
160
- <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
161
-
162
- [More Information Needed]
163
-
164
- ### Training Procedure
165
-
166
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
167
-
168
- #### Preprocessing [optional]
169
-
170
- [More Information Needed]
171
-
172
-
173
- #### Training Hyperparameters
174
-
175
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
176
-
177
- #### Speeds, Sizes, Times [optional]
178
-
179
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
180
-
181
- [More Information Needed]
182
-
183
- ## Evaluation
184
-
185
- <!-- This section describes the evaluation protocols and provides the results. -->
186
-
187
- ### Testing Data, Factors & Metrics
188
-
189
- #### Testing Data
190
-
191
- <!-- This should link to a Data Card if possible. -->
192
-
193
- [More Information Needed]
194
-
195
- #### Factors
196
-
197
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
198
-
199
- [More Information Needed]
200
-
201
- #### Metrics
202
-
203
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
204
-
205
- [More Information Needed]
206
-
207
- ### Results
208
-
209
- [More Information Needed]
210
-
211
- #### Summary
212
-
213
-
214
-
215
- ## Model Examination [optional]
216
-
217
- <!-- Relevant interpretability work for the model goes here -->
218
-
219
- [More Information Needed]
220
-
221
- ## Environmental Impact
222
-
223
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
224
-
225
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
226
-
227
- - **Hardware Type:** [More Information Needed]
228
- - **Hours used:** [More Information Needed]
229
- - **Cloud Provider:** [More Information Needed]
230
- - **Compute Region:** [More Information Needed]
231
- - **Carbon Emitted:** [More Information Needed]
232
-
233
- ## Technical Specifications [optional]
234
-
235
- ### Model Architecture and Objective
236
-
237
- [More Information Needed]
238
-
239
- ### Compute Infrastructure
240
-
241
- [More Information Needed]
242
-
243
- #### Hardware
244
-
245
- [More Information Needed]
246
-
247
- #### Software
248
-
249
- [More Information Needed]
250
-
251
- ## Citation [optional]
252
-
253
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
254
-
255
- **BibTeX:**
256
-
257
- [More Information Needed]
258
-
259
- **APA:**
260
-
261
- [More Information Needed]
262
-
263
- ## Glossary [optional]
264
-
265
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
266
-
267
- [More Information Needed]
268
-
269
- ## More Information [optional]
270
-
271
- [More Information Needed]
272
-
273
- ## Model Card Authors [optional]
274
-
275
- [More Information Needed]
276
-
277
- ## Model Card Contact
278
-
279
- [More Information Needed]
280
 
 
281
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - generated_from_trainer
4
+ datasets:
5
+ - squad_v2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  model-index:
7
+ - name: distilbert-finetuned-uncased-squad_v2
8
+ results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
 
14
+ # distilbert-finetuned-uncased-squad_v2
15
 
16
+ This model was trained from scratch on the squad_v2 dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 1.3332
19
 
20
+ ## Model description
21
 
22
+ More information needed
23
 
24
+ ## Intended uses & limitations
25
 
26
+ More information needed
27
 
28
+ ## Training and evaluation data
29
 
30
+ More information needed
31
 
32
+ ## Training procedure
 
 
 
 
 
33
 
34
+ ### Training hyperparameters
35
 
36
+ The following hyperparameters were used during training:
37
+ - learning_rate: 2e-05
38
+ - train_batch_size: 64
39
+ - eval_batch_size: 64
40
+ - seed: 42
41
+ - gradient_accumulation_steps: 4
42
+ - total_train_batch_size: 256
43
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
44
+ - lr_scheduler_type: linear
45
+ - num_epochs: 4
46
 
47
+ ### Training results
 
 
48
 
49
+ | Training Loss | Epoch | Step | Validation Loss |
50
+ |:-------------:|:-----:|:----:|:---------------:|
51
+ | 3.6437 | 0.39 | 100 | 2.1780 |
52
+ | 2.1596 | 0.78 | 200 | 1.6557 |
53
+ | 1.8138 | 1.18 | 300 | 1.5683 |
54
+ | 1.6987 | 1.57 | 400 | 1.5076 |
55
+ | 1.6586 | 1.96 | 500 | 1.5350 |
56
+ | 1.5957 | 1.18 | 600 | 1.4431 |
57
+ | 1.5825 | 1.37 | 700 | 1.4955 |
58
+ | 1.5523 | 1.57 | 800 | 1.4444 |
59
+ | 1.5346 | 1.76 | 900 | 1.3930 |
60
+ | 1.5098 | 1.96 | 1000 | 1.4285 |
61
+ | 1.4632 | 2.16 | 1100 | 1.3630 |
62
+ | 1.4468 | 2.35 | 1200 | 1.3710 |
63
+ | 1.4343 | 2.55 | 1300 | 1.3422 |
64
+ | 1.4225 | 2.75 | 1400 | 1.3971 |
65
+ | 1.408 | 2.94 | 1500 | 1.4355 |
66
+ | 1.3609 | 3.14 | 1600 | 1.3332 |
67
+ | 1.3398 | 3.33 | 1700 | 1.3792 |
68
+ | 1.3224 | 3.53 | 1800 | 1.4172 |
69
+ | 1.3152 | 3.73 | 1900 | 1.3956 |
70
+ | 1.3141 | 3.92 | 2000 | 1.3748 |
71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
+ ### Framework versions
74
 
75
+ - Transformers 4.34.1
76
+ - Pytorch 2.1.0+cu118
77
+ - Datasets 2.14.5
78
+ - Tokenizers 0.14.1
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "/content/drive/My Drive/Colab Notebooks/aai520-project/checkpoints/distilbert-finetuned-uncased/checkpoint-1700",
3
  "activation": "gelu",
4
  "architectures": [
5
  "DistilBertForQuestionAnswering"
 
1
  {
2
+ "_name_or_path": "/content/drive/My Drive/Colab Notebooks/aai520-project/checkpoints/distilbert-finetuned-uncased/checkpoint-2000",
3
  "activation": "gelu",
4
  "architectures": [
5
  "DistilBertForQuestionAnswering"
metrics.json CHANGED
@@ -1,12 +1,12 @@
1
  {
2
- "exact": 23.347090036216628,
3
- "f1": 26.869992349988973,
4
  "total": 11873,
5
- "HasAns_exact": 38.630229419703106,
6
- "HasAns_f1": 45.686136837283904,
7
  "HasAns_total": 5928,
8
- "NoAns_exact": 8.107653490328007,
9
- "NoAns_f1": 8.107653490328007,
10
  "NoAns_total": 5945,
11
  "best_exact": 50.11370336056599,
12
  "best_exact_thresh": 0.0,
 
1
  {
2
+ "exact": 24.74522024762065,
3
+ "f1": 28.46868820308392,
4
  "total": 11873,
5
+ "HasAns_exact": 42.39203778677463,
6
+ "HasAns_f1": 49.8496516591119,
7
  "HasAns_total": 5928,
8
+ "NoAns_exact": 7.1488645920941964,
9
+ "NoAns_f1": 7.1488645920941964,
10
  "NoAns_total": 5945,
11
  "best_exact": 50.11370336056599,
12
  "best_exact_thresh": 0.0,
tokenizer.json CHANGED
@@ -3,13 +3,11 @@
3
  "truncation": {
4
  "direction": "Right",
5
  "max_length": 512,
6
- "strategy": "LongestFirst",
7
- "stride": 0
8
  },
9
  "padding": {
10
- "strategy": {
11
- "Fixed": 512
12
- },
13
  "direction": "Right",
14
  "pad_to_multiple_of": null,
15
  "pad_id": 0,
 
3
  "truncation": {
4
  "direction": "Right",
5
  "max_length": 512,
6
+ "strategy": "OnlySecond",
7
+ "stride": 128
8
  },
9
  "padding": {
10
+ "strategy": "BatchLongest",
 
 
11
  "direction": "Right",
12
  "pad_to_multiple_of": null,
13
  "pad_id": 0,
trainer_state.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "best_metric": 1.393009066581726,
3
- "best_model_checkpoint": "/content/drive/My Drive/Colab Notebooks/aai520-project/checkpoints/distilbert-finetuned-uncased/checkpoint-900",
4
  "epoch": 4.0,
5
  "eval_steps": 100,
6
- "global_step": 1020,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
@@ -148,29 +148,169 @@
148
  "eval_steps_per_second": 21.862,
149
  "step": 1000
150
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
151
  {
152
  "epoch": 4.0,
153
- "step": 1020,
154
- "total_flos": 5.148633647651021e+16,
155
- "train_loss": 0.028946983113008386,
156
- "train_runtime": 26.7355,
157
- "train_samples_per_second": 19525.044,
158
- "train_steps_per_second": 38.152
159
  },
160
  {
161
  "epoch": 4.0,
162
- "eval_loss": 1.3930128812789917,
163
- "eval_runtime": 8.3016,
164
- "eval_samples_per_second": 1441.77,
165
- "eval_steps_per_second": 11.323,
166
- "step": 1020
167
  }
168
  ],
169
  "logging_steps": 100,
170
- "max_steps": 1020,
171
  "num_train_epochs": 4,
172
  "save_steps": 100,
173
- "total_flos": 5.148633647651021e+16,
174
  "trial_name": null,
175
  "trial_params": null
176
  }
 
1
  {
2
+ "best_metric": 1.3331981897354126,
3
+ "best_model_checkpoint": "/content/drive/My Drive/Colab Notebooks/aai520-project/checkpoints/distilbert-finetuned-uncased/checkpoint-1600",
4
  "epoch": 4.0,
5
  "eval_steps": 100,
6
+ "global_step": 2040,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
 
148
  "eval_steps_per_second": 21.862,
149
  "step": 1000
150
  },
151
+ {
152
+ "epoch": 2.16,
153
+ "learning_rate": 9.215686274509804e-06,
154
+ "loss": 1.4632,
155
+ "step": 1100
156
+ },
157
+ {
158
+ "epoch": 2.16,
159
+ "eval_loss": 1.3630493879318237,
160
+ "eval_runtime": 8.4807,
161
+ "eval_samples_per_second": 1411.328,
162
+ "eval_steps_per_second": 22.168,
163
+ "step": 1100
164
+ },
165
+ {
166
+ "epoch": 2.35,
167
+ "learning_rate": 8.23529411764706e-06,
168
+ "loss": 1.4468,
169
+ "step": 1200
170
+ },
171
+ {
172
+ "epoch": 2.35,
173
+ "eval_loss": 1.370953917503357,
174
+ "eval_runtime": 8.5147,
175
+ "eval_samples_per_second": 1405.685,
176
+ "eval_steps_per_second": 22.079,
177
+ "step": 1200
178
+ },
179
+ {
180
+ "epoch": 2.55,
181
+ "learning_rate": 7.2549019607843145e-06,
182
+ "loss": 1.4343,
183
+ "step": 1300
184
+ },
185
+ {
186
+ "epoch": 2.55,
187
+ "eval_loss": 1.3422259092330933,
188
+ "eval_runtime": 8.4859,
189
+ "eval_samples_per_second": 1410.461,
190
+ "eval_steps_per_second": 22.154,
191
+ "step": 1300
192
+ },
193
+ {
194
+ "epoch": 2.75,
195
+ "learning_rate": 6.274509803921569e-06,
196
+ "loss": 1.4225,
197
+ "step": 1400
198
+ },
199
+ {
200
+ "epoch": 2.75,
201
+ "eval_loss": 1.397080659866333,
202
+ "eval_runtime": 8.4725,
203
+ "eval_samples_per_second": 1412.689,
204
+ "eval_steps_per_second": 22.189,
205
+ "step": 1400
206
+ },
207
+ {
208
+ "epoch": 2.94,
209
+ "learning_rate": 5.294117647058824e-06,
210
+ "loss": 1.408,
211
+ "step": 1500
212
+ },
213
+ {
214
+ "epoch": 2.94,
215
+ "eval_loss": 1.435463547706604,
216
+ "eval_runtime": 8.4775,
217
+ "eval_samples_per_second": 1411.85,
218
+ "eval_steps_per_second": 22.176,
219
+ "step": 1500
220
+ },
221
+ {
222
+ "epoch": 3.14,
223
+ "learning_rate": 4.313725490196079e-06,
224
+ "loss": 1.3609,
225
+ "step": 1600
226
+ },
227
+ {
228
+ "epoch": 3.14,
229
+ "eval_loss": 1.3331981897354126,
230
+ "eval_runtime": 8.4786,
231
+ "eval_samples_per_second": 1411.679,
232
+ "eval_steps_per_second": 22.174,
233
+ "step": 1600
234
+ },
235
+ {
236
+ "epoch": 3.33,
237
+ "learning_rate": 3.3333333333333333e-06,
238
+ "loss": 1.3398,
239
+ "step": 1700
240
+ },
241
+ {
242
+ "epoch": 3.33,
243
+ "eval_loss": 1.3791619539260864,
244
+ "eval_runtime": 8.4678,
245
+ "eval_samples_per_second": 1413.466,
246
+ "eval_steps_per_second": 22.202,
247
+ "step": 1700
248
+ },
249
+ {
250
+ "epoch": 3.53,
251
+ "learning_rate": 2.3529411764705885e-06,
252
+ "loss": 1.3224,
253
+ "step": 1800
254
+ },
255
+ {
256
+ "epoch": 3.53,
257
+ "eval_loss": 1.41716730594635,
258
+ "eval_runtime": 8.4259,
259
+ "eval_samples_per_second": 1420.506,
260
+ "eval_steps_per_second": 22.312,
261
+ "step": 1800
262
+ },
263
+ {
264
+ "epoch": 3.73,
265
+ "learning_rate": 1.3725490196078434e-06,
266
+ "loss": 1.3152,
267
+ "step": 1900
268
+ },
269
+ {
270
+ "epoch": 3.73,
271
+ "eval_loss": 1.3955893516540527,
272
+ "eval_runtime": 8.444,
273
+ "eval_samples_per_second": 1417.453,
274
+ "eval_steps_per_second": 22.264,
275
+ "step": 1900
276
+ },
277
+ {
278
+ "epoch": 3.92,
279
+ "learning_rate": 3.921568627450981e-07,
280
+ "loss": 1.3141,
281
+ "step": 2000
282
+ },
283
+ {
284
+ "epoch": 3.92,
285
+ "eval_loss": 1.3748189210891724,
286
+ "eval_runtime": 8.4509,
287
+ "eval_samples_per_second": 1416.303,
288
+ "eval_steps_per_second": 22.246,
289
+ "step": 2000
290
+ },
291
  {
292
  "epoch": 4.0,
293
+ "step": 2040,
294
+ "total_flos": 8.491863563129856e+16,
295
+ "train_loss": 0.2191746057248583,
296
+ "train_runtime": 267.1285,
297
+ "train_samples_per_second": 1954.161,
298
+ "train_steps_per_second": 7.637
299
  },
300
  {
301
  "epoch": 4.0,
302
+ "eval_loss": 1.3331981897354126,
303
+ "eval_runtime": 8.4359,
304
+ "eval_samples_per_second": 1418.822,
305
+ "eval_steps_per_second": 22.286,
306
+ "step": 2040
307
  }
308
  ],
309
  "logging_steps": 100,
310
+ "max_steps": 2040,
311
  "num_train_epochs": 4,
312
  "save_steps": 100,
313
+ "total_flos": 8.491863563129856e+16,
314
  "trial_name": null,
315
  "trial_params": null
316
  }