HorcruxNo13 commited on
Commit
e851381
1 Parent(s): c1d03b6

beit-mass-secondstep

Browse files
Files changed (7) hide show
  1. README.md +46 -53
  2. all_results.json +6 -6
  3. config.json +4 -4
  4. model.safetensors +1 -1
  5. train_results.json +6 -6
  6. trainer_state.json +442 -568
  7. training_args.bin +1 -1
README.md CHANGED
@@ -19,11 +19,11 @@ should probably proofread and complete it, then remove this comment. -->
19
 
20
  This model is a fine-tuned version of [microsoft/beit-base-patch16-224](https://huggingface.co/microsoft/beit-base-patch16-224) on an unknown dataset.
21
  It achieves the following results on the evaluation set:
22
- - Loss: 0.3752
23
- - Accuracy: 0.9388
24
- - Precision: 0.9451
25
- - Recall: 0.9388
26
- - F1 Score: 0.9412
27
 
28
  ## Model description
29
 
@@ -43,11 +43,11 @@ More information needed
43
 
44
  The following hyperparameters were used during training:
45
  - learning_rate: 5e-05
46
- - train_batch_size: 32
47
- - eval_batch_size: 32
48
  - seed: 42
49
  - gradient_accumulation_steps: 4
50
- - total_train_batch_size: 128
51
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
  - lr_scheduler_type: linear
53
  - lr_scheduler_warmup_ratio: 0.1
@@ -55,51 +55,44 @@ The following hyperparameters were used during training:
55
 
56
  ### Training results
57
 
58
- | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 Score |
59
- |:-------------:|:-------:|:----:|:---------------:|:--------:|:---------:|:------:|:--------:|
60
- | No log | 0.9412 | 4 | 0.3599 | 0.8644 | 0.8831 | 0.8644 | 0.8152 |
61
- | No log | 1.8824 | 8 | 0.2752 | 0.8983 | 0.8983 | 0.8983 | 0.8983 |
62
- | No log | 2.8235 | 12 | 0.1735 | 0.9322 | 0.9293 | 0.9322 | 0.9286 |
63
- | 0.2978 | 4.0 | 17 | 0.1745 | 0.9153 | 0.9311 | 0.9153 | 0.9200 |
64
- | 0.2978 | 4.9412 | 21 | 0.1888 | 0.9153 | 0.9196 | 0.9153 | 0.9171 |
65
- | 0.2978 | 5.8824 | 25 | 0.2819 | 0.8983 | 0.9092 | 0.8983 | 0.9024 |
66
- | 0.2978 | 6.8235 | 29 | 0.5332 | 0.9153 | 0.9230 | 0.9153 | 0.9010 |
67
- | 0.0283 | 8.0 | 34 | 0.5418 | 0.9153 | 0.9311 | 0.9153 | 0.9200 |
68
- | 0.0283 | 8.9412 | 38 | 0.6494 | 0.8983 | 0.9092 | 0.8983 | 0.8758 |
69
- | 0.0283 | 9.8824 | 42 | 0.5615 | 0.9153 | 0.9455 | 0.9153 | 0.9222 |
70
- | 0.0061 | 10.8235 | 46 | 0.8767 | 0.8983 | 0.8910 | 0.8983 | 0.8857 |
71
- | 0.0061 | 12.0 | 51 | 0.3859 | 0.9492 | 0.9619 | 0.9492 | 0.9520 |
72
- | 0.0061 | 12.9412 | 55 | 0.4550 | 0.9322 | 0.9322 | 0.9322 | 0.9322 |
73
- | 0.0061 | 13.8824 | 59 | 0.4314 | 0.9492 | 0.9477 | 0.9492 | 0.9479 |
74
- | 0.01 | 14.8235 | 63 | 0.4127 | 0.9492 | 0.9619 | 0.9492 | 0.9520 |
75
- | 0.01 | 16.0 | 68 | 0.3285 | 0.9492 | 0.9477 | 0.9492 | 0.9479 |
76
- | 0.01 | 16.9412 | 72 | 0.3180 | 0.9492 | 0.9477 | 0.9492 | 0.9479 |
77
- | 0.0076 | 17.8824 | 76 | 0.4482 | 0.9322 | 0.9293 | 0.9322 | 0.9286 |
78
- | 0.0076 | 18.8235 | 80 | 0.4437 | 0.9322 | 0.9322 | 0.9322 | 0.9322 |
79
- | 0.0076 | 20.0 | 85 | 0.4819 | 0.9322 | 0.9322 | 0.9322 | 0.9322 |
80
- | 0.0076 | 20.9412 | 89 | 0.5133 | 0.9322 | 0.9293 | 0.9322 | 0.9286 |
81
- | 0.0003 | 21.8824 | 93 | 0.4540 | 0.9492 | 0.9477 | 0.9492 | 0.9479 |
82
- | 0.0003 | 22.8235 | 97 | 0.3857 | 0.9153 | 0.9196 | 0.9153 | 0.9171 |
83
- | 0.0003 | 24.0 | 102 | 0.4077 | 0.8983 | 0.9092 | 0.8983 | 0.9024 |
84
- | 0.0028 | 24.9412 | 106 | 0.3956 | 0.9492 | 0.9477 | 0.9492 | 0.9479 |
85
- | 0.0028 | 25.8824 | 110 | 0.4671 | 0.9322 | 0.9293 | 0.9322 | 0.9286 |
86
- | 0.0028 | 26.8235 | 114 | 0.3811 | 0.9322 | 0.9322 | 0.9322 | 0.9322 |
87
- | 0.0028 | 28.0 | 119 | 0.3700 | 0.9322 | 0.9322 | 0.9322 | 0.9322 |
88
- | 0.0006 | 28.9412 | 123 | 0.4028 | 0.9322 | 0.9322 | 0.9322 | 0.9322 |
89
- | 0.0006 | 29.8824 | 127 | 0.6924 | 0.9153 | 0.9106 | 0.9153 | 0.9080 |
90
- | 0.0006 | 30.8235 | 131 | 0.6949 | 0.9153 | 0.9106 | 0.9153 | 0.9080 |
91
- | 0.0033 | 32.0 | 136 | 0.5889 | 0.9153 | 0.9120 | 0.9153 | 0.9132 |
92
- | 0.0033 | 32.9412 | 140 | 0.5128 | 0.9322 | 0.9322 | 0.9322 | 0.9322 |
93
- | 0.0033 | 33.8824 | 144 | 0.4411 | 0.9492 | 0.9522 | 0.9492 | 0.9502 |
94
- | 0.0033 | 34.8235 | 148 | 0.4420 | 0.9492 | 0.9522 | 0.9492 | 0.9502 |
95
- | 0.0013 | 36.0 | 153 | 0.5616 | 0.9322 | 0.9322 | 0.9322 | 0.9322 |
96
- | 0.0013 | 36.9412 | 157 | 0.6365 | 0.9153 | 0.9120 | 0.9153 | 0.9132 |
97
- | 0.0013 | 37.8824 | 161 | 0.6695 | 0.9153 | 0.9120 | 0.9153 | 0.9132 |
98
- | 0.0001 | 38.8235 | 165 | 0.6846 | 0.9153 | 0.9120 | 0.9153 | 0.9132 |
99
- | 0.0001 | 40.0 | 170 | 0.6930 | 0.9153 | 0.9120 | 0.9153 | 0.9132 |
100
- | 0.0001 | 40.9412 | 174 | 0.6958 | 0.9153 | 0.9120 | 0.9153 | 0.9132 |
101
- | 0.0001 | 41.8824 | 178 | 0.6967 | 0.9153 | 0.9120 | 0.9153 | 0.9132 |
102
- | 0.0044 | 42.3529 | 180 | 0.6952 | 0.9153 | 0.9120 | 0.9153 | 0.9132 |
103
 
104
 
105
  ### Framework versions
 
19
 
20
  This model is a fine-tuned version of [microsoft/beit-base-patch16-224](https://huggingface.co/microsoft/beit-base-patch16-224) on an unknown dataset.
21
  It achieves the following results on the evaluation set:
22
+ - Loss: 0.8528
23
+ - Accuracy: 0.8268
24
+ - Precision: 0.8303
25
+ - Recall: 0.8268
26
+ - F1 Score: 0.8283
27
 
28
  ## Model description
29
 
 
43
 
44
  The following hyperparameters were used during training:
45
  - learning_rate: 5e-05
46
+ - train_batch_size: 48
47
+ - eval_batch_size: 48
48
  - seed: 42
49
  - gradient_accumulation_steps: 4
50
+ - total_train_batch_size: 192
51
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
  - lr_scheduler_type: linear
53
  - lr_scheduler_warmup_ratio: 0.1
 
55
 
56
  ### Training results
57
 
58
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 Score |
59
+ |:-------------:|:-----:|:----:|:---------------:|:--------:|:---------:|:------:|:--------:|
60
+ | No log | 0.8 | 2 | 0.6993 | 0.5882 | 0.5390 | 0.5882 | 0.5541 |
61
+ | No log | 2.0 | 5 | 0.5971 | 0.6863 | 0.6806 | 0.6863 | 0.6033 |
62
+ | No log | 2.8 | 7 | 0.5306 | 0.8039 | 0.8000 | 0.8039 | 0.8006 |
63
+ | No log | 4.0 | 10 | 0.4828 | 0.7255 | 0.7229 | 0.7255 | 0.6859 |
64
+ | No log | 4.8 | 12 | 0.3812 | 0.7843 | 0.7786 | 0.7843 | 0.7784 |
65
+ | 0.5413 | 6.0 | 15 | 0.5268 | 0.7451 | 0.7461 | 0.7451 | 0.7141 |
66
+ | 0.5413 | 6.8 | 17 | 0.5349 | 0.7451 | 0.8556 | 0.7451 | 0.7502 |
67
+ | 0.5413 | 8.0 | 20 | 0.4120 | 0.8039 | 0.8485 | 0.8039 | 0.7756 |
68
+ | 0.5413 | 8.8 | 22 | 0.3156 | 0.8039 | 0.8003 | 0.8039 | 0.7963 |
69
+ | 0.5413 | 10.0 | 25 | 0.3217 | 0.8039 | 0.8061 | 0.8039 | 0.7909 |
70
+ | 0.5413 | 10.8 | 27 | 0.5161 | 0.7843 | 0.7870 | 0.7843 | 0.7664 |
71
+ | 0.0919 | 12.0 | 30 | 0.3677 | 0.8431 | 0.8498 | 0.8431 | 0.8451 |
72
+ | 0.0919 | 12.8 | 32 | 0.4631 | 0.8431 | 0.8407 | 0.8431 | 0.8405 |
73
+ | 0.0919 | 14.0 | 35 | 0.5001 | 0.8235 | 0.8214 | 0.8235 | 0.8221 |
74
+ | 0.0919 | 14.8 | 37 | 0.4489 | 0.8431 | 0.8431 | 0.8431 | 0.8431 |
75
+ | 0.0919 | 16.0 | 40 | 0.5892 | 0.7843 | 0.7799 | 0.7843 | 0.7731 |
76
+ | 0.0919 | 16.8 | 42 | 0.6579 | 0.7843 | 0.7799 | 0.7843 | 0.7731 |
77
+ | 0.006 | 18.0 | 45 | 0.7038 | 0.7843 | 0.7799 | 0.7843 | 0.7731 |
78
+ | 0.006 | 18.8 | 47 | 0.5864 | 0.8627 | 0.8737 | 0.8627 | 0.8651 |
79
+ | 0.006 | 20.0 | 50 | 0.5488 | 0.8627 | 0.8737 | 0.8627 | 0.8651 |
80
+ | 0.006 | 20.8 | 52 | 0.6651 | 0.8039 | 0.8003 | 0.8039 | 0.7963 |
81
+ | 0.006 | 22.0 | 55 | 0.6265 | 0.8039 | 0.8000 | 0.8039 | 0.8006 |
82
+ | 0.006 | 22.8 | 57 | 0.5229 | 0.8627 | 0.8653 | 0.8627 | 0.8637 |
83
+ | 0.0048 | 24.0 | 60 | 0.5421 | 0.8627 | 0.8653 | 0.8627 | 0.8637 |
84
+ | 0.0048 | 24.8 | 62 | 0.6335 | 0.8235 | 0.8205 | 0.8235 | 0.8187 |
85
+ | 0.0048 | 26.0 | 65 | 1.0379 | 0.8039 | 0.8201 | 0.8039 | 0.7841 |
86
+ | 0.0048 | 26.8 | 67 | 0.9758 | 0.8235 | 0.8366 | 0.8235 | 0.8089 |
87
+ | 0.0048 | 28.0 | 70 | 0.6117 | 0.8235 | 0.8205 | 0.8235 | 0.8187 |
88
+ | 0.0048 | 28.8 | 72 | 0.5403 | 0.8627 | 0.8613 | 0.8627 | 0.8617 |
89
+ | 0.0063 | 30.0 | 75 | 0.6469 | 0.8431 | 0.8407 | 0.8431 | 0.8405 |
90
+ | 0.0063 | 30.8 | 77 | 0.7014 | 0.8235 | 0.8205 | 0.8235 | 0.8187 |
91
+ | 0.0063 | 32.0 | 80 | 0.7514 | 0.8235 | 0.8205 | 0.8235 | 0.8187 |
92
+ | 0.0063 | 32.8 | 82 | 0.7771 | 0.8235 | 0.8248 | 0.8235 | 0.8144 |
93
+ | 0.0063 | 34.0 | 85 | 0.7599 | 0.8039 | 0.8003 | 0.8039 | 0.7963 |
94
+ | 0.0063 | 34.8 | 87 | 0.7554 | 0.8039 | 0.8003 | 0.8039 | 0.7963 |
95
+ | 0.0045 | 36.0 | 90 | 0.7308 | 0.8039 | 0.8003 | 0.8039 | 0.7963 |
 
 
 
 
 
 
 
96
 
97
 
98
  ### Framework versions
all_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
- "epoch": 42.35294117647059,
3
- "total_flos": 1.7260934287224177e+18,
4
- "train_loss": 0.030212831471969064,
5
- "train_runtime": 1290.1323,
6
- "train_samples_per_second": 18.347,
7
- "train_steps_per_second": 0.14
8
  }
 
1
  {
2
+ "epoch": 36.0,
3
+ "total_flos": 1.2659877490145034e+18,
4
+ "train_loss": 0.10912525819407569,
5
+ "train_runtime": 949.2365,
6
+ "train_samples_per_second": 21.523,
7
+ "train_steps_per_second": 0.095
8
  }
config.json CHANGED
@@ -14,15 +14,15 @@
14
  "hidden_dropout_prob": 0.0,
15
  "hidden_size": 768,
16
  "id2label": {
17
- "0": "Absent",
18
- "1": "Present"
19
  },
20
  "image_size": 224,
21
  "initializer_range": 0.02,
22
  "intermediate_size": 3072,
23
  "label2id": {
24
- "Absent": 0,
25
- "Present": 1
26
  },
27
  "layer_norm_eps": 1e-12,
28
  "layer_scale_init_value": 0.1,
 
14
  "hidden_dropout_prob": 0.0,
15
  "hidden_size": 768,
16
  "id2label": {
17
+ "0": "Benign",
18
+ "1": "Malignant"
19
  },
20
  "image_size": 224,
21
  "initializer_range": 0.02,
22
  "intermediate_size": 3072,
23
  "label2id": {
24
+ "Benign": 0,
25
+ "Malignant": 1
26
  },
27
  "layer_norm_eps": 1e-12,
28
  "layer_scale_init_value": 0.1,
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f1863f0a5e1f1f4b18eb39a004179d276e7f5248526fcdb80acc0894ce28ef4c
3
  size 343080328
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:070c5b244ea67c14e33049ce84f0a492470a5f8f7b7bcefcb07dc7846bf3c7d3
3
  size 343080328
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
- "epoch": 42.35294117647059,
3
- "total_flos": 1.7260934287224177e+18,
4
- "train_loss": 0.030212831471969064,
5
- "train_runtime": 1290.1323,
6
- "train_samples_per_second": 18.347,
7
- "train_steps_per_second": 0.14
8
  }
 
1
  {
2
+ "epoch": 36.0,
3
+ "total_flos": 1.2659877490145034e+18,
4
+ "train_loss": 0.10912525819407569,
5
+ "train_runtime": 949.2365,
6
+ "train_samples_per_second": 21.523,
7
+ "train_steps_per_second": 0.095
8
  }
trainer_state.json CHANGED
@@ -1,642 +1,516 @@
1
  {
2
- "best_metric": 0.9491525423728814,
3
- "best_model_checkpoint": "beit-base-patch16-224/checkpoint-51",
4
- "epoch": 42.35294117647059,
5
  "eval_steps": 500,
6
- "global_step": 180,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.9411764705882353,
13
- "eval_accuracy": 0.864406779661017,
14
- "eval_f1_score": 0.8151914626490897,
15
- "eval_loss": 0.35985592007637024,
16
- "eval_precision": 0.8831092928112214,
17
- "eval_recall": 0.864406779661017,
18
- "eval_runtime": 0.994,
19
- "eval_samples_per_second": 59.356,
20
- "eval_steps_per_second": 2.012,
21
- "step": 4
22
- },
23
- {
24
- "epoch": 1.8823529411764706,
25
- "eval_accuracy": 0.8983050847457628,
26
- "eval_f1_score": 0.8983050847457628,
27
- "eval_loss": 0.2752338945865631,
28
- "eval_precision": 0.8983050847457628,
29
- "eval_recall": 0.8983050847457628,
30
- "eval_runtime": 1.1891,
31
- "eval_samples_per_second": 49.617,
32
- "eval_steps_per_second": 1.682,
33
- "step": 8
34
- },
35
- {
36
- "epoch": 2.8235294117647056,
37
- "eval_accuracy": 0.9322033898305084,
38
- "eval_f1_score": 0.9286307743436357,
39
- "eval_loss": 0.17347723245620728,
40
- "eval_precision": 0.9293164462655988,
41
- "eval_recall": 0.9322033898305084,
42
- "eval_runtime": 1.0218,
43
- "eval_samples_per_second": 57.739,
44
- "eval_steps_per_second": 1.957,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  "step": 12
46
  },
47
  {
48
- "epoch": 3.5294117647058822,
49
- "grad_norm": 5.715649604797363,
50
- "learning_rate": 4.166666666666667e-05,
51
- "loss": 0.2978,
 
 
 
 
 
 
 
 
 
 
 
 
52
  "step": 15
53
  },
54
  {
55
- "epoch": 4.0,
56
- "eval_accuracy": 0.9152542372881356,
57
- "eval_f1_score": 0.9199970045680336,
58
- "eval_loss": 0.17451411485671997,
59
- "eval_precision": 0.9311215290299315,
60
- "eval_recall": 0.9152542372881356,
61
- "eval_runtime": 1.228,
62
- "eval_samples_per_second": 48.047,
63
- "eval_steps_per_second": 1.629,
64
  "step": 17
65
  },
66
  {
67
- "epoch": 4.9411764705882355,
68
- "eval_accuracy": 0.9152542372881356,
69
- "eval_f1_score": 0.9170563800358625,
70
- "eval_loss": 0.1887725591659546,
71
- "eval_precision": 0.9196471809062606,
72
- "eval_recall": 0.9152542372881356,
73
- "eval_runtime": 1.0748,
74
- "eval_samples_per_second": 54.895,
75
- "eval_steps_per_second": 1.861,
76
- "step": 21
77
- },
78
- {
79
- "epoch": 5.882352941176471,
80
- "eval_accuracy": 0.8983050847457628,
81
- "eval_f1_score": 0.9023521272915945,
82
- "eval_loss": 0.2818872034549713,
83
- "eval_precision": 0.9092193117616847,
84
- "eval_recall": 0.8983050847457628,
85
- "eval_runtime": 1.2817,
86
- "eval_samples_per_second": 46.032,
87
- "eval_steps_per_second": 1.56,
 
 
 
 
 
 
 
 
 
 
 
 
88
  "step": 25
89
  },
90
  {
91
- "epoch": 6.823529411764706,
92
- "eval_accuracy": 0.9152542372881356,
93
- "eval_f1_score": 0.900974731483206,
94
- "eval_loss": 0.5331762433052063,
95
- "eval_precision": 0.9229583975346687,
96
- "eval_recall": 0.9152542372881356,
97
- "eval_runtime": 1.1367,
98
- "eval_samples_per_second": 51.907,
99
- "eval_steps_per_second": 1.76,
100
- "step": 29
101
  },
102
  {
103
- "epoch": 7.0588235294117645,
104
- "grad_norm": 3.518982410430908,
105
- "learning_rate": 4.62962962962963e-05,
106
- "loss": 0.0283,
107
  "step": 30
108
  },
109
  {
110
- "epoch": 8.0,
111
- "eval_accuracy": 0.9152542372881356,
112
- "eval_f1_score": 0.9199970045680336,
113
- "eval_loss": 0.5418176054954529,
114
- "eval_precision": 0.9311215290299315,
115
- "eval_recall": 0.9152542372881356,
116
- "eval_runtime": 1.0994,
117
- "eval_samples_per_second": 53.664,
118
- "eval_steps_per_second": 1.819,
119
- "step": 34
120
- },
121
- {
122
- "epoch": 8.941176470588236,
123
- "eval_accuracy": 0.8983050847457628,
124
- "eval_f1_score": 0.8757595139110971,
125
- "eval_loss": 0.6493940353393555,
126
- "eval_precision": 0.9092009685230025,
127
- "eval_recall": 0.8983050847457628,
128
- "eval_runtime": 1.1076,
129
- "eval_samples_per_second": 53.266,
130
- "eval_steps_per_second": 1.806,
131
- "step": 38
132
- },
133
- {
134
- "epoch": 9.882352941176471,
135
- "eval_accuracy": 0.9152542372881356,
136
- "eval_f1_score": 0.9222355815847652,
137
- "eval_loss": 0.5614629983901978,
138
- "eval_precision": 0.9455205811138014,
139
- "eval_recall": 0.9152542372881356,
140
- "eval_runtime": 1.107,
141
- "eval_samples_per_second": 53.298,
142
- "eval_steps_per_second": 1.807,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
143
  "step": 42
144
  },
145
  {
146
- "epoch": 10.588235294117647,
147
- "grad_norm": 0.022936690598726273,
148
- "learning_rate": 4.166666666666667e-05,
149
- "loss": 0.0061,
150
  "step": 45
151
  },
152
  {
153
- "epoch": 10.823529411764707,
154
- "eval_accuracy": 0.8983050847457628,
155
- "eval_f1_score": 0.8857329111566401,
156
- "eval_loss": 0.8766900897026062,
157
- "eval_precision": 0.8910232266164471,
158
- "eval_recall": 0.8983050847457628,
159
- "eval_runtime": 1.0968,
160
- "eval_samples_per_second": 53.791,
161
- "eval_steps_per_second": 1.823,
162
- "step": 46
163
  },
164
  {
165
- "epoch": 12.0,
166
- "eval_accuracy": 0.9491525423728814,
167
- "eval_f1_score": 0.9519982027408203,
168
- "eval_loss": 0.3859255313873291,
169
- "eval_precision": 0.961864406779661,
170
- "eval_recall": 0.9491525423728814,
171
- "eval_runtime": 1.1019,
172
- "eval_samples_per_second": 53.546,
173
- "eval_steps_per_second": 1.815,
174
- "step": 51
175
- },
176
- {
177
- "epoch": 12.941176470588236,
178
- "eval_accuracy": 0.9322033898305084,
179
- "eval_f1_score": 0.9322033898305084,
180
- "eval_loss": 0.4550356864929199,
181
- "eval_precision": 0.9322033898305084,
182
- "eval_recall": 0.9322033898305084,
183
- "eval_runtime": 1.1103,
184
- "eval_samples_per_second": 53.137,
185
- "eval_steps_per_second": 1.801,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
186
  "step": 55
187
  },
188
  {
189
- "epoch": 13.882352941176471,
190
- "eval_accuracy": 0.9491525423728814,
191
- "eval_f1_score": 0.947908749000523,
192
- "eval_loss": 0.4313892722129822,
193
- "eval_precision": 0.9476985709538053,
194
- "eval_recall": 0.9491525423728814,
195
- "eval_runtime": 1.1142,
196
- "eval_samples_per_second": 52.955,
197
- "eval_steps_per_second": 1.795,
198
- "step": 59
199
  },
200
  {
201
- "epoch": 14.117647058823529,
202
- "grad_norm": 5.196343898773193,
203
- "learning_rate": 3.7037037037037037e-05,
204
- "loss": 0.01,
 
 
 
 
 
 
 
 
 
 
 
 
205
  "step": 60
206
  },
207
  {
208
- "epoch": 14.823529411764707,
209
- "eval_accuracy": 0.9491525423728814,
210
- "eval_f1_score": 0.9519982027408203,
211
- "eval_loss": 0.41266247630119324,
212
- "eval_precision": 0.961864406779661,
213
- "eval_recall": 0.9491525423728814,
214
- "eval_runtime": 1.1128,
215
- "eval_samples_per_second": 53.019,
216
- "eval_steps_per_second": 1.797,
217
- "step": 63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
218
  },
219
  {
220
- "epoch": 16.0,
221
- "eval_accuracy": 0.9491525423728814,
222
- "eval_f1_score": 0.947908749000523,
223
- "eval_loss": 0.3284989297389984,
224
- "eval_precision": 0.9476985709538053,
225
- "eval_recall": 0.9491525423728814,
226
- "eval_runtime": 1.1075,
227
- "eval_samples_per_second": 53.271,
228
- "eval_steps_per_second": 1.806,
229
- "step": 68
230
- },
231
- {
232
- "epoch": 16.941176470588236,
233
- "eval_accuracy": 0.9491525423728814,
234
- "eval_f1_score": 0.947908749000523,
235
- "eval_loss": 0.3179616332054138,
236
- "eval_precision": 0.9476985709538053,
237
- "eval_recall": 0.9491525423728814,
238
- "eval_runtime": 1.0963,
239
- "eval_samples_per_second": 53.819,
240
- "eval_steps_per_second": 1.824,
241
  "step": 72
242
  },
243
  {
244
- "epoch": 17.647058823529413,
245
- "grad_norm": 5.957318305969238,
246
- "learning_rate": 3.240740740740741e-05,
247
- "loss": 0.0076,
248
  "step": 75
249
  },
250
  {
251
- "epoch": 17.88235294117647,
252
- "eval_accuracy": 0.9322033898305084,
253
- "eval_f1_score": 0.9286307743436357,
254
- "eval_loss": 0.44822579622268677,
255
- "eval_precision": 0.9293164462655988,
256
- "eval_recall": 0.9322033898305084,
257
- "eval_runtime": 1.1817,
258
- "eval_samples_per_second": 49.929,
259
- "eval_steps_per_second": 1.693,
260
- "step": 76
261
- },
262
- {
263
- "epoch": 18.823529411764707,
264
- "eval_accuracy": 0.9322033898305084,
265
- "eval_f1_score": 0.9322033898305084,
266
- "eval_loss": 0.44370484352111816,
267
- "eval_precision": 0.9322033898305084,
268
- "eval_recall": 0.9322033898305084,
269
- "eval_runtime": 1.1079,
270
- "eval_samples_per_second": 53.253,
271
- "eval_steps_per_second": 1.805,
272
- "step": 80
273
- },
274
- {
275
- "epoch": 20.0,
276
- "eval_accuracy": 0.9322033898305084,
277
- "eval_f1_score": 0.9322033898305084,
278
- "eval_loss": 0.4818989932537079,
279
- "eval_precision": 0.9322033898305084,
280
- "eval_recall": 0.9322033898305084,
281
- "eval_runtime": 1.3186,
282
- "eval_samples_per_second": 44.744,
283
- "eval_steps_per_second": 1.517,
284
- "step": 85
285
  },
286
  {
287
- "epoch": 20.941176470588236,
288
- "eval_accuracy": 0.9322033898305084,
289
- "eval_f1_score": 0.9286307743436357,
290
- "eval_loss": 0.5132895112037659,
291
- "eval_precision": 0.9293164462655988,
292
- "eval_recall": 0.9322033898305084,
293
- "eval_runtime": 1.1055,
294
- "eval_samples_per_second": 53.367,
295
- "eval_steps_per_second": 1.809,
296
- "step": 89
297
  },
298
  {
299
- "epoch": 21.176470588235293,
300
- "grad_norm": 0.27098149061203003,
301
- "learning_rate": 2.777777777777778e-05,
302
- "loss": 0.0003,
303
- "step": 90
 
 
 
 
 
304
  },
305
  {
306
- "epoch": 21.88235294117647,
307
- "eval_accuracy": 0.9491525423728814,
308
- "eval_f1_score": 0.947908749000523,
309
- "eval_loss": 0.45395800471305847,
310
- "eval_precision": 0.9476985709538053,
311
- "eval_recall": 0.9491525423728814,
312
- "eval_runtime": 1.1075,
313
- "eval_samples_per_second": 53.275,
314
- "eval_steps_per_second": 1.806,
315
- "step": 93
 
 
 
 
 
 
 
 
 
 
 
 
316
  },
317
  {
318
- "epoch": 22.823529411764707,
319
- "eval_accuracy": 0.9152542372881356,
320
- "eval_f1_score": 0.9170563800358625,
321
- "eval_loss": 0.38566043972969055,
322
- "eval_precision": 0.9196471809062606,
323
- "eval_recall": 0.9152542372881356,
324
- "eval_runtime": 1.0947,
325
- "eval_samples_per_second": 53.897,
326
- "eval_steps_per_second": 1.827,
327
- "step": 97
328
  },
329
  {
330
- "epoch": 24.0,
331
- "eval_accuracy": 0.8983050847457628,
332
- "eval_f1_score": 0.9023521272915945,
333
- "eval_loss": 0.4077180027961731,
334
- "eval_precision": 0.9092193117616847,
335
- "eval_recall": 0.8983050847457628,
336
- "eval_runtime": 1.1092,
337
- "eval_samples_per_second": 53.192,
338
- "eval_steps_per_second": 1.803,
339
- "step": 102
340
- },
341
- {
342
- "epoch": 24.705882352941178,
343
- "grad_norm": 0.018473587930202484,
344
- "learning_rate": 2.314814814814815e-05,
345
- "loss": 0.0028,
346
- "step": 105
347
- },
348
- {
349
- "epoch": 24.941176470588236,
350
- "eval_accuracy": 0.9491525423728814,
351
- "eval_f1_score": 0.947908749000523,
352
- "eval_loss": 0.3955690562725067,
353
- "eval_precision": 0.9476985709538053,
354
- "eval_recall": 0.9491525423728814,
355
- "eval_runtime": 1.2914,
356
- "eval_samples_per_second": 45.688,
357
- "eval_steps_per_second": 1.549,
358
- "step": 106
359
- },
360
- {
361
- "epoch": 25.88235294117647,
362
- "eval_accuracy": 0.9322033898305084,
363
- "eval_f1_score": 0.9286307743436357,
364
- "eval_loss": 0.4670986831188202,
365
- "eval_precision": 0.9293164462655988,
366
- "eval_recall": 0.9322033898305084,
367
- "eval_runtime": 1.1219,
368
- "eval_samples_per_second": 52.592,
369
- "eval_steps_per_second": 1.783,
370
- "step": 110
371
- },
372
- {
373
- "epoch": 26.823529411764707,
374
- "eval_accuracy": 0.9322033898305084,
375
- "eval_f1_score": 0.9322033898305084,
376
- "eval_loss": 0.3811493515968323,
377
- "eval_precision": 0.9322033898305084,
378
- "eval_recall": 0.9322033898305084,
379
- "eval_runtime": 1.2582,
380
- "eval_samples_per_second": 46.893,
381
- "eval_steps_per_second": 1.59,
382
- "step": 114
383
  },
384
  {
385
- "epoch": 28.0,
386
- "eval_accuracy": 0.9322033898305084,
387
- "eval_f1_score": 0.9322033898305084,
388
- "eval_loss": 0.3700270354747772,
389
- "eval_precision": 0.9322033898305084,
390
- "eval_recall": 0.9322033898305084,
391
- "eval_runtime": 1.1041,
392
- "eval_samples_per_second": 53.436,
393
- "eval_steps_per_second": 1.811,
394
- "step": 119
395
- },
396
- {
397
- "epoch": 28.235294117647058,
398
- "grad_norm": 0.08375111222267151,
399
- "learning_rate": 1.8518518518518518e-05,
400
- "loss": 0.0006,
401
- "step": 120
402
- },
403
- {
404
- "epoch": 28.941176470588236,
405
- "eval_accuracy": 0.9322033898305084,
406
- "eval_f1_score": 0.9322033898305084,
407
- "eval_loss": 0.40281012654304504,
408
- "eval_precision": 0.9322033898305084,
409
- "eval_recall": 0.9322033898305084,
410
- "eval_runtime": 1.1715,
411
- "eval_samples_per_second": 50.362,
412
- "eval_steps_per_second": 1.707,
413
- "step": 123
414
- },
415
- {
416
- "epoch": 29.88235294117647,
417
- "eval_accuracy": 0.9152542372881356,
418
- "eval_f1_score": 0.9080138226098403,
419
- "eval_loss": 0.6924118995666504,
420
- "eval_precision": 0.9106172049888072,
421
- "eval_recall": 0.9152542372881356,
422
- "eval_runtime": 1.1072,
423
- "eval_samples_per_second": 53.287,
424
- "eval_steps_per_second": 1.806,
425
- "step": 127
426
- },
427
- {
428
- "epoch": 30.823529411764707,
429
- "eval_accuracy": 0.9152542372881356,
430
- "eval_f1_score": 0.9080138226098403,
431
- "eval_loss": 0.6948609948158264,
432
- "eval_precision": 0.9106172049888072,
433
- "eval_recall": 0.9152542372881356,
434
- "eval_runtime": 1.1092,
435
- "eval_samples_per_second": 53.191,
436
- "eval_steps_per_second": 1.803,
437
- "step": 131
438
- },
439
- {
440
- "epoch": 31.764705882352942,
441
- "grad_norm": 0.0031740041449666023,
442
- "learning_rate": 1.388888888888889e-05,
443
- "loss": 0.0033,
444
- "step": 135
445
  },
446
  {
447
- "epoch": 32.0,
448
- "eval_accuracy": 0.9152542372881356,
449
- "eval_f1_score": 0.9131812483342053,
450
- "eval_loss": 0.5888532996177673,
451
- "eval_precision": 0.912013958125623,
452
- "eval_recall": 0.9152542372881356,
453
- "eval_runtime": 1.1154,
454
- "eval_samples_per_second": 52.896,
455
- "eval_steps_per_second": 1.793,
456
- "step": 136
457
- },
458
- {
459
- "epoch": 32.94117647058823,
460
- "eval_accuracy": 0.9322033898305084,
461
- "eval_f1_score": 0.9322033898305084,
462
- "eval_loss": 0.5128433108329773,
463
- "eval_precision": 0.9322033898305084,
464
- "eval_recall": 0.9322033898305084,
465
- "eval_runtime": 1.0996,
466
- "eval_samples_per_second": 53.657,
467
- "eval_steps_per_second": 1.819,
468
- "step": 140
469
- },
470
- {
471
- "epoch": 33.88235294117647,
472
- "eval_accuracy": 0.9491525423728814,
473
- "eval_f1_score": 0.9502338280215176,
474
- "eval_loss": 0.44105064868927,
475
- "eval_precision": 0.9521964718090626,
476
- "eval_recall": 0.9491525423728814,
477
- "eval_runtime": 1.3012,
478
- "eval_samples_per_second": 45.342,
479
- "eval_steps_per_second": 1.537,
480
- "step": 144
481
- },
482
- {
483
- "epoch": 34.8235294117647,
484
- "eval_accuracy": 0.9491525423728814,
485
- "eval_f1_score": 0.9502338280215176,
486
- "eval_loss": 0.4420201778411865,
487
- "eval_precision": 0.9521964718090626,
488
- "eval_recall": 0.9491525423728814,
489
- "eval_runtime": 1.1093,
490
- "eval_samples_per_second": 53.188,
491
- "eval_steps_per_second": 1.803,
492
- "step": 148
493
- },
494
- {
495
- "epoch": 35.294117647058826,
496
- "grad_norm": 0.0013447869569063187,
497
- "learning_rate": 9.259259259259259e-06,
498
- "loss": 0.0013,
499
- "step": 150
500
  },
501
  {
502
  "epoch": 36.0,
503
- "eval_accuracy": 0.9322033898305084,
504
- "eval_f1_score": 0.9322033898305084,
505
- "eval_loss": 0.5615989565849304,
506
- "eval_precision": 0.9322033898305084,
507
- "eval_recall": 0.9322033898305084,
508
- "eval_runtime": 1.1347,
509
- "eval_samples_per_second": 51.997,
510
- "eval_steps_per_second": 1.763,
511
- "step": 153
512
- },
513
- {
514
- "epoch": 36.94117647058823,
515
- "eval_accuracy": 0.9152542372881356,
516
- "eval_f1_score": 0.9131812483342053,
517
- "eval_loss": 0.6365456581115723,
518
- "eval_precision": 0.912013958125623,
519
- "eval_recall": 0.9152542372881356,
520
- "eval_runtime": 1.0934,
521
- "eval_samples_per_second": 53.961,
522
- "eval_steps_per_second": 1.829,
523
- "step": 157
524
- },
525
- {
526
- "epoch": 37.88235294117647,
527
- "eval_accuracy": 0.9152542372881356,
528
- "eval_f1_score": 0.9131812483342053,
529
- "eval_loss": 0.6694910526275635,
530
- "eval_precision": 0.912013958125623,
531
- "eval_recall": 0.9152542372881356,
532
- "eval_runtime": 1.0997,
533
- "eval_samples_per_second": 53.65,
534
- "eval_steps_per_second": 1.819,
535
- "step": 161
536
- },
537
- {
538
- "epoch": 38.8235294117647,
539
- "grad_norm": 0.0024713820312172174,
540
- "learning_rate": 4.6296296296296296e-06,
541
- "loss": 0.0001,
542
- "step": 165
543
- },
544
- {
545
- "epoch": 38.8235294117647,
546
- "eval_accuracy": 0.9152542372881356,
547
- "eval_f1_score": 0.9131812483342053,
548
- "eval_loss": 0.6845612525939941,
549
- "eval_precision": 0.912013958125623,
550
- "eval_recall": 0.9152542372881356,
551
- "eval_runtime": 1.1919,
552
- "eval_samples_per_second": 49.501,
553
- "eval_steps_per_second": 1.678,
554
- "step": 165
555
- },
556
- {
557
- "epoch": 40.0,
558
- "eval_accuracy": 0.9152542372881356,
559
- "eval_f1_score": 0.9131812483342053,
560
- "eval_loss": 0.6930243968963623,
561
- "eval_precision": 0.912013958125623,
562
- "eval_recall": 0.9152542372881356,
563
- "eval_runtime": 1.1022,
564
- "eval_samples_per_second": 53.53,
565
- "eval_steps_per_second": 1.815,
566
- "step": 170
567
- },
568
- {
569
- "epoch": 40.94117647058823,
570
- "eval_accuracy": 0.9152542372881356,
571
- "eval_f1_score": 0.9131812483342053,
572
- "eval_loss": 0.6957547068595886,
573
- "eval_precision": 0.912013958125623,
574
- "eval_recall": 0.9152542372881356,
575
- "eval_runtime": 1.1025,
576
- "eval_samples_per_second": 53.515,
577
- "eval_steps_per_second": 1.814,
578
- "step": 174
579
- },
580
- {
581
- "epoch": 41.88235294117647,
582
- "eval_accuracy": 0.9152542372881356,
583
- "eval_f1_score": 0.9131812483342053,
584
- "eval_loss": 0.6966932415962219,
585
- "eval_precision": 0.912013958125623,
586
- "eval_recall": 0.9152542372881356,
587
- "eval_runtime": 1.0997,
588
- "eval_samples_per_second": 53.649,
589
- "eval_steps_per_second": 1.819,
590
- "step": 178
591
- },
592
- {
593
- "epoch": 42.35294117647059,
594
- "grad_norm": 0.0012529775267466903,
595
- "learning_rate": 0.0,
596
- "loss": 0.0044,
597
- "step": 180
598
- },
599
- {
600
- "epoch": 42.35294117647059,
601
- "eval_accuracy": 0.9152542372881356,
602
- "eval_f1_score": 0.9131812483342053,
603
- "eval_loss": 0.6952070593833923,
604
- "eval_precision": 0.912013958125623,
605
- "eval_recall": 0.9152542372881356,
606
- "eval_runtime": 1.142,
607
- "eval_samples_per_second": 51.664,
608
- "eval_steps_per_second": 1.751,
609
- "step": 180
610
- },
611
- {
612
- "epoch": 42.35294117647059,
613
- "step": 180,
614
- "total_flos": 1.7260934287224177e+18,
615
- "train_loss": 0.030212831471969064,
616
- "train_runtime": 1290.1323,
617
- "train_samples_per_second": 18.347,
618
- "train_steps_per_second": 0.14
619
- },
620
- {
621
- "epoch": 42.35294117647059,
622
- "eval_accuracy": 0.9387755102040817,
623
- "eval_f1_score": 0.9412065766745571,
624
- "eval_loss": 0.3751787841320038,
625
- "eval_precision": 0.9451036228444866,
626
- "eval_recall": 0.9387755102040817,
627
- "eval_runtime": 3.0643,
628
- "eval_samples_per_second": 47.972,
629
- "eval_steps_per_second": 1.632,
630
- "step": 180
631
  }
632
  ],
633
  "logging_steps": 15,
634
- "max_steps": 180,
635
  "num_input_tokens_seen": 0,
636
  "num_train_epochs": 45,
637
  "save_steps": 500,
638
- "total_flos": 1.7260934287224177e+18,
639
- "train_batch_size": 32,
640
  "trial_name": null,
641
  "trial_params": null
642
  }
 
1
  {
2
+ "best_metric": 0.8627450980392157,
3
+ "best_model_checkpoint": "beit-base-patch16-224/checkpoint-47",
4
+ "epoch": 36.0,
5
  "eval_steps": 500,
6
+ "global_step": 90,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.8,
13
+ "eval_accuracy": 0.5882352941176471,
14
+ "eval_f1_score": 0.554074074074074,
15
+ "eval_loss": 0.6992508172988892,
16
+ "eval_precision": 0.5390243902439025,
17
+ "eval_recall": 0.5882352941176471,
18
+ "eval_runtime": 0.8655,
19
+ "eval_samples_per_second": 58.922,
20
+ "eval_steps_per_second": 2.311,
21
+ "step": 2
22
+ },
23
+ {
24
+ "epoch": 2.0,
25
+ "eval_accuracy": 0.6862745098039216,
26
+ "eval_f1_score": 0.6032520325203252,
27
+ "eval_loss": 0.5970537662506104,
28
+ "eval_precision": 0.6805555555555555,
29
+ "eval_recall": 0.6862745098039216,
30
+ "eval_runtime": 0.8959,
31
+ "eval_samples_per_second": 56.925,
32
+ "eval_steps_per_second": 2.232,
33
+ "step": 5
34
+ },
35
+ {
36
+ "epoch": 2.8,
37
+ "eval_accuracy": 0.803921568627451,
38
+ "eval_f1_score": 0.800595238095238,
39
+ "eval_loss": 0.5305531024932861,
40
+ "eval_precision": 0.7999999999999999,
41
+ "eval_recall": 0.803921568627451,
42
+ "eval_runtime": 0.9046,
43
+ "eval_samples_per_second": 56.379,
44
+ "eval_steps_per_second": 2.211,
45
+ "step": 7
46
+ },
47
+ {
48
+ "epoch": 4.0,
49
+ "eval_accuracy": 0.7254901960784313,
50
+ "eval_f1_score": 0.6858974358974359,
51
+ "eval_loss": 0.48283636569976807,
52
+ "eval_precision": 0.722943722943723,
53
+ "eval_recall": 0.7254901960784313,
54
+ "eval_runtime": 0.9029,
55
+ "eval_samples_per_second": 56.482,
56
+ "eval_steps_per_second": 2.215,
57
+ "step": 10
58
+ },
59
+ {
60
+ "epoch": 4.8,
61
+ "eval_accuracy": 0.7843137254901961,
62
+ "eval_f1_score": 0.7784340451310011,
63
+ "eval_loss": 0.3811856508255005,
64
+ "eval_precision": 0.7786357786357786,
65
+ "eval_recall": 0.7843137254901961,
66
+ "eval_runtime": 0.92,
67
+ "eval_samples_per_second": 55.436,
68
+ "eval_steps_per_second": 2.174,
69
  "step": 12
70
  },
71
  {
72
+ "epoch": 6.0,
73
+ "grad_norm": 4.578022480010986,
74
+ "learning_rate": 4.62962962962963e-05,
75
+ "loss": 0.5413,
76
+ "step": 15
77
+ },
78
+ {
79
+ "epoch": 6.0,
80
+ "eval_accuracy": 0.7450980392156863,
81
+ "eval_f1_score": 0.7141125541125543,
82
+ "eval_loss": 0.5268120765686035,
83
+ "eval_precision": 0.7461240310077519,
84
+ "eval_recall": 0.7450980392156863,
85
+ "eval_runtime": 0.9096,
86
+ "eval_samples_per_second": 56.066,
87
+ "eval_steps_per_second": 2.199,
88
  "step": 15
89
  },
90
  {
91
+ "epoch": 6.8,
92
+ "eval_accuracy": 0.7450980392156863,
93
+ "eval_f1_score": 0.7502256608639587,
94
+ "eval_loss": 0.5349109768867493,
95
+ "eval_precision": 0.8555555555555555,
96
+ "eval_recall": 0.7450980392156863,
97
+ "eval_runtime": 0.9137,
98
+ "eval_samples_per_second": 55.818,
99
+ "eval_steps_per_second": 2.189,
100
  "step": 17
101
  },
102
  {
103
+ "epoch": 8.0,
104
+ "eval_accuracy": 0.803921568627451,
105
+ "eval_f1_score": 0.7756410256410257,
106
+ "eval_loss": 0.4119790494441986,
107
+ "eval_precision": 0.8484848484848485,
108
+ "eval_recall": 0.803921568627451,
109
+ "eval_runtime": 0.9237,
110
+ "eval_samples_per_second": 55.215,
111
+ "eval_steps_per_second": 2.165,
112
+ "step": 20
113
+ },
114
+ {
115
+ "epoch": 8.8,
116
+ "eval_accuracy": 0.803921568627451,
117
+ "eval_f1_score": 0.7962962962962962,
118
+ "eval_loss": 0.3156317472457886,
119
+ "eval_precision": 0.8002699055330634,
120
+ "eval_recall": 0.803921568627451,
121
+ "eval_runtime": 0.9335,
122
+ "eval_samples_per_second": 54.63,
123
+ "eval_steps_per_second": 2.142,
124
+ "step": 22
125
+ },
126
+ {
127
+ "epoch": 10.0,
128
+ "eval_accuracy": 0.803921568627451,
129
+ "eval_f1_score": 0.7908622908622909,
130
+ "eval_loss": 0.3216821253299713,
131
+ "eval_precision": 0.806060606060606,
132
+ "eval_recall": 0.803921568627451,
133
+ "eval_runtime": 0.9256,
134
+ "eval_samples_per_second": 55.1,
135
+ "eval_steps_per_second": 2.161,
136
  "step": 25
137
  },
138
  {
139
+ "epoch": 10.8,
140
+ "eval_accuracy": 0.7843137254901961,
141
+ "eval_f1_score": 0.7664197530864199,
142
+ "eval_loss": 0.5160595774650574,
143
+ "eval_precision": 0.7869918699186993,
144
+ "eval_recall": 0.7843137254901961,
145
+ "eval_runtime": 0.9267,
146
+ "eval_samples_per_second": 55.031,
147
+ "eval_steps_per_second": 2.158,
148
+ "step": 27
149
  },
150
  {
151
+ "epoch": 12.0,
152
+ "grad_norm": 3.5482540130615234,
153
+ "learning_rate": 3.7037037037037037e-05,
154
+ "loss": 0.0919,
155
  "step": 30
156
  },
157
  {
158
+ "epoch": 12.0,
159
+ "eval_accuracy": 0.8431372549019608,
160
+ "eval_f1_score": 0.845117845117845,
161
+ "eval_loss": 0.36771491169929504,
162
+ "eval_precision": 0.849780701754386,
163
+ "eval_recall": 0.8431372549019608,
164
+ "eval_runtime": 0.942,
165
+ "eval_samples_per_second": 54.142,
166
+ "eval_steps_per_second": 2.123,
167
+ "step": 30
168
+ },
169
+ {
170
+ "epoch": 12.8,
171
+ "eval_accuracy": 0.8431372549019608,
172
+ "eval_f1_score": 0.8404761904761906,
173
+ "eval_loss": 0.46310773491859436,
174
+ "eval_precision": 0.8407407407407408,
175
+ "eval_recall": 0.8431372549019608,
176
+ "eval_runtime": 0.9403,
177
+ "eval_samples_per_second": 54.24,
178
+ "eval_steps_per_second": 2.127,
179
+ "step": 32
180
+ },
181
+ {
182
+ "epoch": 14.0,
183
+ "eval_accuracy": 0.8235294117647058,
184
+ "eval_f1_score": 0.8221343873517787,
185
+ "eval_loss": 0.5000560879707336,
186
+ "eval_precision": 0.8214285714285714,
187
+ "eval_recall": 0.8235294117647058,
188
+ "eval_runtime": 0.9615,
189
+ "eval_samples_per_second": 53.039,
190
+ "eval_steps_per_second": 2.08,
191
+ "step": 35
192
+ },
193
+ {
194
+ "epoch": 14.8,
195
+ "eval_accuracy": 0.8431372549019608,
196
+ "eval_f1_score": 0.8431372549019608,
197
+ "eval_loss": 0.4489041268825531,
198
+ "eval_precision": 0.8431372549019608,
199
+ "eval_recall": 0.8431372549019608,
200
+ "eval_runtime": 0.9337,
201
+ "eval_samples_per_second": 54.621,
202
+ "eval_steps_per_second": 2.142,
203
+ "step": 37
204
+ },
205
+ {
206
+ "epoch": 16.0,
207
+ "eval_accuracy": 0.7843137254901961,
208
+ "eval_f1_score": 0.7731065973862385,
209
+ "eval_loss": 0.5892294049263,
210
+ "eval_precision": 0.7799145299145298,
211
+ "eval_recall": 0.7843137254901961,
212
+ "eval_runtime": 1.0872,
213
+ "eval_samples_per_second": 46.909,
214
+ "eval_steps_per_second": 1.84,
215
+ "step": 40
216
+ },
217
+ {
218
+ "epoch": 16.8,
219
+ "eval_accuracy": 0.7843137254901961,
220
+ "eval_f1_score": 0.7731065973862385,
221
+ "eval_loss": 0.6578794717788696,
222
+ "eval_precision": 0.7799145299145298,
223
+ "eval_recall": 0.7843137254901961,
224
+ "eval_runtime": 0.9215,
225
+ "eval_samples_per_second": 55.345,
226
+ "eval_steps_per_second": 2.17,
227
  "step": 42
228
  },
229
  {
230
+ "epoch": 18.0,
231
+ "grad_norm": 3.25277042388916,
232
+ "learning_rate": 2.777777777777778e-05,
233
+ "loss": 0.006,
234
  "step": 45
235
  },
236
  {
237
+ "epoch": 18.0,
238
+ "eval_accuracy": 0.7843137254901961,
239
+ "eval_f1_score": 0.7731065973862385,
240
+ "eval_loss": 0.703818678855896,
241
+ "eval_precision": 0.7799145299145298,
242
+ "eval_recall": 0.7843137254901961,
243
+ "eval_runtime": 1.0077,
244
+ "eval_samples_per_second": 50.61,
245
+ "eval_steps_per_second": 1.985,
246
+ "step": 45
247
  },
248
  {
249
+ "epoch": 18.8,
250
+ "eval_accuracy": 0.8627450980392157,
251
+ "eval_f1_score": 0.865142065142065,
252
+ "eval_loss": 0.5864243507385254,
253
+ "eval_precision": 0.8736559139784946,
254
+ "eval_recall": 0.8627450980392157,
255
+ "eval_runtime": 0.9259,
256
+ "eval_samples_per_second": 55.08,
257
+ "eval_steps_per_second": 2.16,
258
+ "step": 47
259
+ },
260
+ {
261
+ "epoch": 20.0,
262
+ "eval_accuracy": 0.8627450980392157,
263
+ "eval_f1_score": 0.865142065142065,
264
+ "eval_loss": 0.5488199591636658,
265
+ "eval_precision": 0.8736559139784946,
266
+ "eval_recall": 0.8627450980392157,
267
+ "eval_runtime": 0.9318,
268
+ "eval_samples_per_second": 54.735,
269
+ "eval_steps_per_second": 2.146,
270
+ "step": 50
271
+ },
272
+ {
273
+ "epoch": 20.8,
274
+ "eval_accuracy": 0.803921568627451,
275
+ "eval_f1_score": 0.7962962962962962,
276
+ "eval_loss": 0.6650967597961426,
277
+ "eval_precision": 0.8002699055330634,
278
+ "eval_recall": 0.803921568627451,
279
+ "eval_runtime": 0.9328,
280
+ "eval_samples_per_second": 54.677,
281
+ "eval_steps_per_second": 2.144,
282
+ "step": 52
283
+ },
284
+ {
285
+ "epoch": 22.0,
286
+ "eval_accuracy": 0.803921568627451,
287
+ "eval_f1_score": 0.800595238095238,
288
+ "eval_loss": 0.6264931559562683,
289
+ "eval_precision": 0.7999999999999999,
290
+ "eval_recall": 0.803921568627451,
291
+ "eval_runtime": 0.9317,
292
+ "eval_samples_per_second": 54.741,
293
+ "eval_steps_per_second": 2.147,
294
  "step": 55
295
  },
296
  {
297
+ "epoch": 22.8,
298
+ "eval_accuracy": 0.8627450980392157,
299
+ "eval_f1_score": 0.8636815920398009,
300
+ "eval_loss": 0.5228903889656067,
301
+ "eval_precision": 0.8653198653198653,
302
+ "eval_recall": 0.8627450980392157,
303
+ "eval_runtime": 0.9295,
304
+ "eval_samples_per_second": 54.868,
305
+ "eval_steps_per_second": 2.152,
306
+ "step": 57
307
  },
308
  {
309
+ "epoch": 24.0,
310
+ "grad_norm": 0.0452270582318306,
311
+ "learning_rate": 1.8518518518518518e-05,
312
+ "loss": 0.0048,
313
+ "step": 60
314
+ },
315
+ {
316
+ "epoch": 24.0,
317
+ "eval_accuracy": 0.8627450980392157,
318
+ "eval_f1_score": 0.8636815920398009,
319
+ "eval_loss": 0.542142927646637,
320
+ "eval_precision": 0.8653198653198653,
321
+ "eval_recall": 0.8627450980392157,
322
+ "eval_runtime": 0.9409,
323
+ "eval_samples_per_second": 54.206,
324
+ "eval_steps_per_second": 2.126,
325
  "step": 60
326
  },
327
  {
328
+ "epoch": 24.8,
329
+ "eval_accuracy": 0.8235294117647058,
330
+ "eval_f1_score": 0.8187187641980918,
331
+ "eval_loss": 0.6334545016288757,
332
+ "eval_precision": 0.8204633204633205,
333
+ "eval_recall": 0.8235294117647058,
334
+ "eval_runtime": 0.9368,
335
+ "eval_samples_per_second": 54.438,
336
+ "eval_steps_per_second": 2.135,
337
+ "step": 62
338
+ },
339
+ {
340
+ "epoch": 26.0,
341
+ "eval_accuracy": 0.803921568627451,
342
+ "eval_f1_score": 0.7840755735492576,
343
+ "eval_loss": 1.0379055738449097,
344
+ "eval_precision": 0.82010582010582,
345
+ "eval_recall": 0.803921568627451,
346
+ "eval_runtime": 0.927,
347
+ "eval_samples_per_second": 55.015,
348
+ "eval_steps_per_second": 2.157,
349
+ "step": 65
350
+ },
351
+ {
352
+ "epoch": 26.8,
353
+ "eval_accuracy": 0.8235294117647058,
354
+ "eval_f1_score": 0.808888888888889,
355
+ "eval_loss": 0.9758451581001282,
356
+ "eval_precision": 0.8365853658536586,
357
+ "eval_recall": 0.8235294117647058,
358
+ "eval_runtime": 0.927,
359
+ "eval_samples_per_second": 55.017,
360
+ "eval_steps_per_second": 2.158,
361
+ "step": 67
362
  },
363
  {
364
+ "epoch": 28.0,
365
+ "eval_accuracy": 0.8235294117647058,
366
+ "eval_f1_score": 0.8187187641980918,
367
+ "eval_loss": 0.6116669774055481,
368
+ "eval_precision": 0.8204633204633205,
369
+ "eval_recall": 0.8235294117647058,
370
+ "eval_runtime": 0.9261,
371
+ "eval_samples_per_second": 55.07,
372
+ "eval_steps_per_second": 2.16,
373
+ "step": 70
374
+ },
375
+ {
376
+ "epoch": 28.8,
377
+ "eval_accuracy": 0.8627450980392157,
378
+ "eval_f1_score": 0.8616600790513834,
379
+ "eval_loss": 0.540273904800415,
380
+ "eval_precision": 0.8613095238095237,
381
+ "eval_recall": 0.8627450980392157,
382
+ "eval_runtime": 0.9247,
383
+ "eval_samples_per_second": 55.15,
384
+ "eval_steps_per_second": 2.163,
385
  "step": 72
386
  },
387
  {
388
+ "epoch": 30.0,
389
+ "grad_norm": 0.026938632130622864,
390
+ "learning_rate": 9.259259259259259e-06,
391
+ "loss": 0.0063,
392
  "step": 75
393
  },
394
  {
395
+ "epoch": 30.0,
396
+ "eval_accuracy": 0.8431372549019608,
397
+ "eval_f1_score": 0.8404761904761906,
398
+ "eval_loss": 0.6468568444252014,
399
+ "eval_precision": 0.8407407407407408,
400
+ "eval_recall": 0.8431372549019608,
401
+ "eval_runtime": 0.9235,
402
+ "eval_samples_per_second": 55.223,
403
+ "eval_steps_per_second": 2.166,
404
+ "step": 75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
405
  },
406
  {
407
+ "epoch": 30.8,
408
+ "eval_accuracy": 0.8235294117647058,
409
+ "eval_f1_score": 0.8187187641980918,
410
+ "eval_loss": 0.7013790607452393,
411
+ "eval_precision": 0.8204633204633205,
412
+ "eval_recall": 0.8235294117647058,
413
+ "eval_runtime": 1.1388,
414
+ "eval_samples_per_second": 44.785,
415
+ "eval_steps_per_second": 1.756,
416
+ "step": 77
417
  },
418
  {
419
+ "epoch": 32.0,
420
+ "eval_accuracy": 0.8235294117647058,
421
+ "eval_f1_score": 0.8187187641980918,
422
+ "eval_loss": 0.7514360547065735,
423
+ "eval_precision": 0.8204633204633205,
424
+ "eval_recall": 0.8235294117647058,
425
+ "eval_runtime": 0.9424,
426
+ "eval_samples_per_second": 54.118,
427
+ "eval_steps_per_second": 2.122,
428
+ "step": 80
429
  },
430
  {
431
+ "epoch": 32.8,
432
+ "eval_accuracy": 0.8235294117647058,
433
+ "eval_f1_score": 0.8143599433160132,
434
+ "eval_loss": 0.7771488428115845,
435
+ "eval_precision": 0.8247863247863249,
436
+ "eval_recall": 0.8235294117647058,
437
+ "eval_runtime": 0.9338,
438
+ "eval_samples_per_second": 54.616,
439
+ "eval_steps_per_second": 2.142,
440
+ "step": 82
441
+ },
442
+ {
443
+ "epoch": 34.0,
444
+ "eval_accuracy": 0.803921568627451,
445
+ "eval_f1_score": 0.7962962962962962,
446
+ "eval_loss": 0.7598747611045837,
447
+ "eval_precision": 0.8002699055330634,
448
+ "eval_recall": 0.803921568627451,
449
+ "eval_runtime": 0.9331,
450
+ "eval_samples_per_second": 54.655,
451
+ "eval_steps_per_second": 2.143,
452
+ "step": 85
453
  },
454
  {
455
+ "epoch": 34.8,
456
+ "eval_accuracy": 0.803921568627451,
457
+ "eval_f1_score": 0.7962962962962962,
458
+ "eval_loss": 0.7554459571838379,
459
+ "eval_precision": 0.8002699055330634,
460
+ "eval_recall": 0.803921568627451,
461
+ "eval_runtime": 0.9307,
462
+ "eval_samples_per_second": 54.796,
463
+ "eval_steps_per_second": 2.149,
464
+ "step": 87
465
  },
466
  {
467
+ "epoch": 36.0,
468
+ "grad_norm": 0.014645076356828213,
469
+ "learning_rate": 0.0,
470
+ "loss": 0.0045,
471
+ "step": 90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
472
  },
473
  {
474
+ "epoch": 36.0,
475
+ "eval_accuracy": 0.803921568627451,
476
+ "eval_f1_score": 0.7962962962962962,
477
+ "eval_loss": 0.7308478951454163,
478
+ "eval_precision": 0.8002699055330634,
479
+ "eval_recall": 0.803921568627451,
480
+ "eval_runtime": 0.9231,
481
+ "eval_samples_per_second": 55.246,
482
+ "eval_steps_per_second": 2.167,
483
+ "step": 90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
484
  },
485
  {
486
+ "epoch": 36.0,
487
+ "step": 90,
488
+ "total_flos": 1.2659877490145034e+18,
489
+ "train_loss": 0.10912525819407569,
490
+ "train_runtime": 949.2365,
491
+ "train_samples_per_second": 21.523,
492
+ "train_steps_per_second": 0.095
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
493
  },
494
  {
495
  "epoch": 36.0,
496
+ "eval_accuracy": 0.8267716535433071,
497
+ "eval_f1_score": 0.8283048858023182,
498
+ "eval_loss": 0.8527529239654541,
499
+ "eval_precision": 0.8302904444636728,
500
+ "eval_recall": 0.8267716535433071,
501
+ "eval_runtime": 2.5545,
502
+ "eval_samples_per_second": 49.716,
503
+ "eval_steps_per_second": 1.174,
504
+ "step": 90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
505
  }
506
  ],
507
  "logging_steps": 15,
508
+ "max_steps": 90,
509
  "num_input_tokens_seen": 0,
510
  "num_train_epochs": 45,
511
  "save_steps": 500,
512
+ "total_flos": 1.2659877490145034e+18,
513
+ "train_batch_size": 48,
514
  "trial_name": null,
515
  "trial_params": null
516
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2f84ad3380b312710e9817387d47973045f268d1a4130faf2df2cc0c2c171617
3
  size 4984
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d9c4f61894c45d65bf229deb2fd4dc876cbd903f9e7da4c19cfe4a3825e08c68
3
  size 4984