lombardata commited on
Commit
1e99df2
1 Parent(s): 274419a

Evaluation on the test set completed on 2024_09_03.

Browse files
README.md ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: facebook/dinov2-small
4
+ tags:
5
+ - generated_from_trainer
6
+ metrics:
7
+ - accuracy
8
+ model-index:
9
+ - name: DinoVdeau-small-2024_08_31-batch-size32_epochs150_freeze
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # DinoVdeau-small-2024_08_31-batch-size32_epochs150_freeze
17
+
18
+ This model is a fine-tuned version of [facebook/dinov2-small](https://huggingface.co/facebook/dinov2-small) on the None dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.1320
21
+ - F1 Micro: 0.8009
22
+ - F1 Macro: 0.6614
23
+ - Roc Auc: 0.8649
24
+ - Accuracy: 0.2903
25
+ - Learning Rate: 0.0000
26
+
27
+ ## Model description
28
+
29
+ More information needed
30
+
31
+ ## Intended uses & limitations
32
+
33
+ More information needed
34
+
35
+ ## Training and evaluation data
36
+
37
+ More information needed
38
+
39
+ ## Training procedure
40
+
41
+ ### Training hyperparameters
42
+
43
+ The following hyperparameters were used during training:
44
+ - learning_rate: 0.001
45
+ - train_batch_size: 32
46
+ - eval_batch_size: 32
47
+ - seed: 42
48
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
+ - lr_scheduler_type: linear
50
+ - num_epochs: 150
51
+ - mixed_precision_training: Native AMP
52
+
53
+ ### Training results
54
+
55
+ | Training Loss | Epoch | Step | Validation Loss | F1 Micro | F1 Macro | Roc Auc | Accuracy | Rate |
56
+ |:-------------:|:-----:|:-----:|:---------------:|:--------:|:--------:|:-------:|:--------:|:------:|
57
+ | No log | 1.0 | 273 | 0.1957 | 0.7089 | 0.4059 | 0.8061 | 0.1906 | 0.001 |
58
+ | 0.3189 | 2.0 | 546 | 0.1720 | 0.7381 | 0.4868 | 0.8255 | 0.2193 | 0.001 |
59
+ | 0.3189 | 3.0 | 819 | 0.1621 | 0.7579 | 0.5587 | 0.8388 | 0.2322 | 0.001 |
60
+ | 0.1897 | 4.0 | 1092 | 0.1595 | 0.7463 | 0.5562 | 0.8221 | 0.2249 | 0.001 |
61
+ | 0.1897 | 5.0 | 1365 | 0.1569 | 0.7511 | 0.5723 | 0.8245 | 0.2315 | 0.001 |
62
+ | 0.1808 | 6.0 | 1638 | 0.1530 | 0.7635 | 0.5787 | 0.8365 | 0.2363 | 0.001 |
63
+ | 0.1808 | 7.0 | 1911 | 0.1523 | 0.7652 | 0.5982 | 0.8389 | 0.2335 | 0.001 |
64
+ | 0.1763 | 8.0 | 2184 | 0.1531 | 0.7655 | 0.5880 | 0.8377 | 0.2419 | 0.001 |
65
+ | 0.1763 | 9.0 | 2457 | 0.1499 | 0.7700 | 0.6069 | 0.8431 | 0.2401 | 0.001 |
66
+ | 0.1735 | 10.0 | 2730 | 0.1510 | 0.7606 | 0.5829 | 0.8277 | 0.2439 | 0.001 |
67
+ | 0.1723 | 11.0 | 3003 | 0.1521 | 0.7690 | 0.5976 | 0.8400 | 0.2505 | 0.001 |
68
+ | 0.1723 | 12.0 | 3276 | 0.1503 | 0.7760 | 0.6074 | 0.8527 | 0.2443 | 0.001 |
69
+ | 0.1719 | 13.0 | 3549 | 0.1504 | 0.7624 | 0.6003 | 0.8302 | 0.2439 | 0.001 |
70
+ | 0.1719 | 14.0 | 3822 | 0.1497 | 0.7644 | 0.6028 | 0.8343 | 0.2446 | 0.001 |
71
+ | 0.1702 | 15.0 | 4095 | 0.1475 | 0.7752 | 0.6066 | 0.8446 | 0.2512 | 0.001 |
72
+ | 0.1702 | 16.0 | 4368 | 0.1500 | 0.7646 | 0.5838 | 0.8321 | 0.2464 | 0.001 |
73
+ | 0.1696 | 17.0 | 4641 | 0.1530 | 0.7720 | 0.6073 | 0.8464 | 0.2457 | 0.001 |
74
+ | 0.1696 | 18.0 | 4914 | 0.1491 | 0.7752 | 0.6143 | 0.8475 | 0.2439 | 0.001 |
75
+ | 0.1717 | 19.0 | 5187 | 0.1495 | 0.7740 | 0.6075 | 0.8484 | 0.2346 | 0.001 |
76
+ | 0.1717 | 20.0 | 5460 | 0.1487 | 0.7637 | 0.5956 | 0.8322 | 0.2453 | 0.001 |
77
+ | 0.1705 | 21.0 | 5733 | 0.1471 | 0.7805 | 0.6165 | 0.8540 | 0.2474 | 0.001 |
78
+ | 0.1706 | 22.0 | 6006 | 0.1509 | 0.7754 | 0.6074 | 0.8494 | 0.2453 | 0.001 |
79
+ | 0.1706 | 23.0 | 6279 | 0.1502 | 0.7719 | 0.6127 | 0.8388 | 0.2429 | 0.001 |
80
+ | 0.1699 | 24.0 | 6552 | 0.1497 | 0.7699 | 0.5849 | 0.8406 | 0.2401 | 0.001 |
81
+ | 0.1699 | 25.0 | 6825 | 0.1470 | 0.7761 | 0.6035 | 0.8459 | 0.2426 | 0.001 |
82
+ | 0.1694 | 26.0 | 7098 | 0.1481 | 0.7751 | 0.6065 | 0.8466 | 0.2422 | 0.001 |
83
+ | 0.1694 | 27.0 | 7371 | 0.1458 | 0.7689 | 0.6136 | 0.8357 | 0.2474 | 0.001 |
84
+ | 0.17 | 28.0 | 7644 | 0.1454 | 0.7751 | 0.6077 | 0.8441 | 0.2446 | 0.001 |
85
+ | 0.17 | 29.0 | 7917 | 0.1494 | 0.7735 | 0.6108 | 0.8491 | 0.2457 | 0.001 |
86
+ | 0.1685 | 30.0 | 8190 | 0.1455 | 0.7705 | 0.5983 | 0.8366 | 0.2498 | 0.001 |
87
+ | 0.1685 | 31.0 | 8463 | 0.1454 | 0.7785 | 0.6069 | 0.8495 | 0.2533 | 0.001 |
88
+ | 0.1687 | 32.0 | 8736 | 0.1466 | 0.7746 | 0.6145 | 0.8461 | 0.2453 | 0.001 |
89
+ | 0.1679 | 33.0 | 9009 | 0.1446 | 0.7770 | 0.6125 | 0.8439 | 0.2540 | 0.001 |
90
+ | 0.1679 | 34.0 | 9282 | 0.1468 | 0.7781 | 0.6168 | 0.8470 | 0.2446 | 0.001 |
91
+ | 0.168 | 35.0 | 9555 | 0.1486 | 0.7767 | 0.6193 | 0.8452 | 0.2495 | 0.001 |
92
+ | 0.168 | 36.0 | 9828 | 0.1464 | 0.7719 | 0.6093 | 0.8391 | 0.2488 | 0.001 |
93
+ | 0.169 | 37.0 | 10101 | 0.1448 | 0.7734 | 0.6127 | 0.8402 | 0.2498 | 0.001 |
94
+ | 0.169 | 38.0 | 10374 | 0.1451 | 0.7815 | 0.6110 | 0.8526 | 0.2523 | 0.001 |
95
+ | 0.167 | 39.0 | 10647 | 0.1447 | 0.7824 | 0.6272 | 0.8563 | 0.2498 | 0.001 |
96
+ | 0.167 | 40.0 | 10920 | 0.1482 | 0.7837 | 0.6266 | 0.8537 | 0.2536 | 0.0001 |
97
+ | 0.1652 | 41.0 | 11193 | 0.1414 | 0.7833 | 0.6324 | 0.8483 | 0.2616 | 0.0001 |
98
+ | 0.1652 | 42.0 | 11466 | 0.1398 | 0.7884 | 0.6372 | 0.8546 | 0.2620 | 0.0001 |
99
+ | 0.1608 | 43.0 | 11739 | 0.1411 | 0.7871 | 0.6367 | 0.8537 | 0.2640 | 0.0001 |
100
+ | 0.1596 | 44.0 | 12012 | 0.1390 | 0.7879 | 0.6257 | 0.8537 | 0.2613 | 0.0001 |
101
+ | 0.1596 | 45.0 | 12285 | 0.1386 | 0.7894 | 0.6421 | 0.8539 | 0.2665 | 0.0001 |
102
+ | 0.1582 | 46.0 | 12558 | 0.1396 | 0.7874 | 0.6283 | 0.8522 | 0.2665 | 0.0001 |
103
+ | 0.1582 | 47.0 | 12831 | 0.1387 | 0.7864 | 0.6287 | 0.8500 | 0.2637 | 0.0001 |
104
+ | 0.1584 | 48.0 | 13104 | 0.1378 | 0.7913 | 0.6335 | 0.8572 | 0.2678 | 0.0001 |
105
+ | 0.1584 | 49.0 | 13377 | 0.1377 | 0.7934 | 0.6382 | 0.8603 | 0.2640 | 0.0001 |
106
+ | 0.157 | 50.0 | 13650 | 0.1376 | 0.7918 | 0.6363 | 0.8570 | 0.2675 | 0.0001 |
107
+ | 0.157 | 51.0 | 13923 | 0.1375 | 0.7929 | 0.6427 | 0.8597 | 0.2661 | 0.0001 |
108
+ | 0.1567 | 52.0 | 14196 | 0.1377 | 0.7871 | 0.6368 | 0.8507 | 0.2658 | 0.0001 |
109
+ | 0.1567 | 53.0 | 14469 | 0.1374 | 0.7929 | 0.6406 | 0.8601 | 0.2692 | 0.0001 |
110
+ | 0.1571 | 54.0 | 14742 | 0.1369 | 0.7921 | 0.6412 | 0.8562 | 0.2717 | 0.0001 |
111
+ | 0.1548 | 55.0 | 15015 | 0.1370 | 0.7914 | 0.6378 | 0.8558 | 0.2703 | 0.0001 |
112
+ | 0.1548 | 56.0 | 15288 | 0.1365 | 0.7931 | 0.6425 | 0.8602 | 0.2644 | 0.0001 |
113
+ | 0.155 | 57.0 | 15561 | 0.1368 | 0.7926 | 0.6382 | 0.8588 | 0.2675 | 0.0001 |
114
+ | 0.155 | 58.0 | 15834 | 0.1365 | 0.7916 | 0.6374 | 0.8553 | 0.2675 | 0.0001 |
115
+ | 0.155 | 59.0 | 16107 | 0.1364 | 0.7922 | 0.6429 | 0.8565 | 0.2675 | 0.0001 |
116
+ | 0.155 | 60.0 | 16380 | 0.1369 | 0.7883 | 0.6358 | 0.8515 | 0.2651 | 0.0001 |
117
+ | 0.1546 | 61.0 | 16653 | 0.1364 | 0.7946 | 0.6504 | 0.8589 | 0.2713 | 0.0001 |
118
+ | 0.1546 | 62.0 | 16926 | 0.1356 | 0.7932 | 0.6442 | 0.8575 | 0.2751 | 0.0001 |
119
+ | 0.1536 | 63.0 | 17199 | 0.1355 | 0.7966 | 0.6516 | 0.8611 | 0.2737 | 0.0001 |
120
+ | 0.1536 | 64.0 | 17472 | 0.1359 | 0.7934 | 0.6450 | 0.8578 | 0.2678 | 0.0001 |
121
+ | 0.1544 | 65.0 | 17745 | 0.1357 | 0.7936 | 0.6455 | 0.8572 | 0.2706 | 0.0001 |
122
+ | 0.1529 | 66.0 | 18018 | 0.1357 | 0.7946 | 0.6477 | 0.8595 | 0.2713 | 0.0001 |
123
+ | 0.1529 | 67.0 | 18291 | 0.1353 | 0.7966 | 0.6544 | 0.8623 | 0.2755 | 0.0001 |
124
+ | 0.1528 | 68.0 | 18564 | 0.1353 | 0.7956 | 0.6519 | 0.8608 | 0.2734 | 0.0001 |
125
+ | 0.1528 | 69.0 | 18837 | 0.1347 | 0.7966 | 0.6516 | 0.8603 | 0.2699 | 0.0001 |
126
+ | 0.1528 | 70.0 | 19110 | 0.1350 | 0.7945 | 0.6442 | 0.8575 | 0.2720 | 0.0001 |
127
+ | 0.1528 | 71.0 | 19383 | 0.1350 | 0.7933 | 0.6442 | 0.8557 | 0.2723 | 0.0001 |
128
+ | 0.1522 | 72.0 | 19656 | 0.1345 | 0.7970 | 0.6485 | 0.8605 | 0.2758 | 0.0001 |
129
+ | 0.1522 | 73.0 | 19929 | 0.1342 | 0.7977 | 0.6519 | 0.8616 | 0.2762 | 0.0001 |
130
+ | 0.1523 | 74.0 | 20202 | 0.1350 | 0.7915 | 0.6413 | 0.8520 | 0.2751 | 0.0001 |
131
+ | 0.1523 | 75.0 | 20475 | 0.1346 | 0.7947 | 0.6485 | 0.8572 | 0.2751 | 0.0001 |
132
+ | 0.1521 | 76.0 | 20748 | 0.1344 | 0.7965 | 0.6478 | 0.8598 | 0.2758 | 0.0001 |
133
+ | 0.1515 | 77.0 | 21021 | 0.1346 | 0.7978 | 0.6537 | 0.8623 | 0.2775 | 0.0001 |
134
+ | 0.1515 | 78.0 | 21294 | 0.1341 | 0.7978 | 0.6543 | 0.8635 | 0.2775 | 0.0001 |
135
+ | 0.1514 | 79.0 | 21567 | 0.1340 | 0.7953 | 0.6523 | 0.8574 | 0.2741 | 0.0001 |
136
+ | 0.1514 | 80.0 | 21840 | 0.1344 | 0.7993 | 0.6546 | 0.8653 | 0.2782 | 0.0001 |
137
+ | 0.1516 | 81.0 | 22113 | 0.1341 | 0.7967 | 0.6560 | 0.8576 | 0.2758 | 0.0001 |
138
+ | 0.1516 | 82.0 | 22386 | 0.1341 | 0.7948 | 0.6454 | 0.8555 | 0.2765 | 0.0001 |
139
+ | 0.149 | 83.0 | 22659 | 0.1351 | 0.7924 | 0.6460 | 0.8543 | 0.2703 | 0.0001 |
140
+ | 0.149 | 84.0 | 22932 | 0.1339 | 0.7957 | 0.6512 | 0.8586 | 0.2755 | 0.0001 |
141
+ | 0.1515 | 85.0 | 23205 | 0.1334 | 0.7991 | 0.6532 | 0.8620 | 0.2793 | 0.0001 |
142
+ | 0.1515 | 86.0 | 23478 | 0.1334 | 0.7988 | 0.6596 | 0.8625 | 0.2748 | 0.0001 |
143
+ | 0.1495 | 87.0 | 23751 | 0.1340 | 0.7956 | 0.6467 | 0.8591 | 0.2744 | 0.0001 |
144
+ | 0.1496 | 88.0 | 24024 | 0.1336 | 0.7982 | 0.6483 | 0.8620 | 0.2748 | 0.0001 |
145
+ | 0.1496 | 89.0 | 24297 | 0.1337 | 0.8015 | 0.6585 | 0.8672 | 0.2807 | 0.0001 |
146
+ | 0.1493 | 90.0 | 24570 | 0.1333 | 0.8011 | 0.6621 | 0.8661 | 0.2772 | 0.0001 |
147
+ | 0.1493 | 91.0 | 24843 | 0.1337 | 0.7957 | 0.6529 | 0.8563 | 0.2782 | 0.0001 |
148
+ | 0.1496 | 92.0 | 25116 | 0.1335 | 0.7961 | 0.6514 | 0.8574 | 0.2755 | 0.0001 |
149
+ | 0.1496 | 93.0 | 25389 | 0.1331 | 0.8002 | 0.6560 | 0.8648 | 0.2758 | 0.0001 |
150
+ | 0.1493 | 94.0 | 25662 | 0.1333 | 0.7995 | 0.6554 | 0.8643 | 0.2758 | 0.0001 |
151
+ | 0.1493 | 95.0 | 25935 | 0.1331 | 0.7980 | 0.6580 | 0.8606 | 0.2758 | 0.0001 |
152
+ | 0.1482 | 96.0 | 26208 | 0.1328 | 0.7993 | 0.6556 | 0.8631 | 0.2751 | 0.0001 |
153
+ | 0.1482 | 97.0 | 26481 | 0.1333 | 0.7977 | 0.6493 | 0.8589 | 0.2782 | 0.0001 |
154
+ | 0.1497 | 98.0 | 26754 | 0.1327 | 0.7996 | 0.6600 | 0.8647 | 0.2755 | 0.0001 |
155
+ | 0.1489 | 99.0 | 27027 | 0.1325 | 0.7979 | 0.6590 | 0.8608 | 0.2717 | 0.0001 |
156
+ | 0.1489 | 100.0 | 27300 | 0.1329 | 0.7971 | 0.6570 | 0.8585 | 0.2762 | 0.0001 |
157
+ | 0.1482 | 101.0 | 27573 | 0.1327 | 0.7992 | 0.6580 | 0.8611 | 0.2821 | 0.0001 |
158
+ | 0.1482 | 102.0 | 27846 | 0.1326 | 0.7987 | 0.6543 | 0.8608 | 0.2817 | 0.0001 |
159
+ | 0.1474 | 103.0 | 28119 | 0.1325 | 0.7994 | 0.6518 | 0.8621 | 0.2803 | 0.0001 |
160
+ | 0.1474 | 104.0 | 28392 | 0.1332 | 0.8011 | 0.6613 | 0.8647 | 0.2775 | 0.0001 |
161
+ | 0.1472 | 105.0 | 28665 | 0.1322 | 0.8013 | 0.6636 | 0.8652 | 0.2831 | 0.0001 |
162
+ | 0.1472 | 106.0 | 28938 | 0.1324 | 0.8010 | 0.6588 | 0.8633 | 0.2831 | 0.0001 |
163
+ | 0.148 | 107.0 | 29211 | 0.1336 | 0.7986 | 0.6506 | 0.8619 | 0.2786 | 0.0001 |
164
+ | 0.148 | 108.0 | 29484 | 0.1327 | 0.7996 | 0.6501 | 0.8615 | 0.2796 | 0.0001 |
165
+ | 0.1477 | 109.0 | 29757 | 0.1318 | 0.8000 | 0.6580 | 0.8613 | 0.2807 | 0.0001 |
166
+ | 0.1479 | 110.0 | 30030 | 0.1326 | 0.7997 | 0.6582 | 0.8626 | 0.2803 | 0.0001 |
167
+ | 0.1479 | 111.0 | 30303 | 0.1319 | 0.8013 | 0.6609 | 0.8638 | 0.2786 | 0.0001 |
168
+ | 0.1466 | 112.0 | 30576 | 0.1322 | 0.8019 | 0.6595 | 0.8659 | 0.2810 | 0.0001 |
169
+ | 0.1466 | 113.0 | 30849 | 0.1321 | 0.8025 | 0.6592 | 0.8667 | 0.2800 | 0.0001 |
170
+ | 0.1474 | 114.0 | 31122 | 0.1320 | 0.8025 | 0.6631 | 0.8662 | 0.2824 | 0.0001 |
171
+ | 0.1474 | 115.0 | 31395 | 0.1319 | 0.8004 | 0.6598 | 0.8625 | 0.2838 | 0.0001 |
172
+ | 0.1468 | 116.0 | 31668 | 0.1319 | 0.8022 | 0.6627 | 0.8643 | 0.2845 | 1e-05 |
173
+ | 0.1468 | 117.0 | 31941 | 0.1318 | 0.8013 | 0.6604 | 0.8634 | 0.2821 | 1e-05 |
174
+ | 0.1455 | 118.0 | 32214 | 0.1316 | 0.8002 | 0.6590 | 0.8616 | 0.2796 | 1e-05 |
175
+ | 0.1455 | 119.0 | 32487 | 0.1319 | 0.8037 | 0.6608 | 0.8678 | 0.2827 | 1e-05 |
176
+ | 0.1451 | 120.0 | 32760 | 0.1316 | 0.8036 | 0.6615 | 0.8662 | 0.2814 | 1e-05 |
177
+ | 0.1454 | 121.0 | 33033 | 0.1318 | 0.8013 | 0.6611 | 0.8635 | 0.2810 | 1e-05 |
178
+ | 0.1454 | 122.0 | 33306 | 0.1322 | 0.8050 | 0.6647 | 0.8692 | 0.2817 | 1e-05 |
179
+ | 0.145 | 123.0 | 33579 | 0.1319 | 0.8010 | 0.6605 | 0.8618 | 0.2817 | 1e-05 |
180
+ | 0.145 | 124.0 | 33852 | 0.1314 | 0.8019 | 0.6622 | 0.8638 | 0.2807 | 1e-05 |
181
+ | 0.1459 | 125.0 | 34125 | 0.1314 | 0.8043 | 0.6641 | 0.8672 | 0.2862 | 1e-05 |
182
+ | 0.1459 | 126.0 | 34398 | 0.1310 | 0.8042 | 0.6630 | 0.8670 | 0.2862 | 1e-05 |
183
+ | 0.1439 | 127.0 | 34671 | 0.1315 | 0.8038 | 0.6598 | 0.8673 | 0.2859 | 1e-05 |
184
+ | 0.1439 | 128.0 | 34944 | 0.1311 | 0.8042 | 0.6682 | 0.8674 | 0.2869 | 1e-05 |
185
+ | 0.1446 | 129.0 | 35217 | 0.1310 | 0.8035 | 0.6653 | 0.8665 | 0.2827 | 1e-05 |
186
+ | 0.1446 | 130.0 | 35490 | 0.1310 | 0.8034 | 0.6657 | 0.8668 | 0.2866 | 1e-05 |
187
+ | 0.1449 | 131.0 | 35763 | 0.1313 | 0.8052 | 0.6709 | 0.8699 | 0.2834 | 1e-05 |
188
+ | 0.1442 | 132.0 | 36036 | 0.1315 | 0.7986 | 0.6558 | 0.8595 | 0.2807 | 1e-05 |
189
+ | 0.1442 | 133.0 | 36309 | 0.1311 | 0.8052 | 0.6689 | 0.8692 | 0.2879 | 1e-05 |
190
+ | 0.1443 | 134.0 | 36582 | 0.1309 | 0.8021 | 0.6648 | 0.8640 | 0.2827 | 1e-05 |
191
+ | 0.1443 | 135.0 | 36855 | 0.1315 | 0.8038 | 0.6684 | 0.8665 | 0.2869 | 1e-05 |
192
+ | 0.1438 | 136.0 | 37128 | 0.1315 | 0.8025 | 0.6590 | 0.8634 | 0.2827 | 1e-05 |
193
+ | 0.1438 | 137.0 | 37401 | 0.1311 | 0.8036 | 0.6667 | 0.8648 | 0.2859 | 1e-05 |
194
+ | 0.1452 | 138.0 | 37674 | 0.1312 | 0.8035 | 0.6666 | 0.8661 | 0.2845 | 1e-05 |
195
+ | 0.1452 | 139.0 | 37947 | 0.1310 | 0.8053 | 0.6661 | 0.8689 | 0.2897 | 1e-05 |
196
+ | 0.144 | 140.0 | 38220 | 0.1317 | 0.8020 | 0.6635 | 0.8643 | 0.2834 | 1e-05 |
197
+ | 0.144 | 141.0 | 38493 | 0.1309 | 0.8047 | 0.6688 | 0.8673 | 0.2876 | 0.0000 |
198
+ | 0.1445 | 142.0 | 38766 | 0.1310 | 0.8042 | 0.6643 | 0.8657 | 0.2859 | 0.0000 |
199
+ | 0.1441 | 143.0 | 39039 | 0.1314 | 0.8019 | 0.6623 | 0.8635 | 0.2872 | 0.0000 |
200
+ | 0.1441 | 144.0 | 39312 | 0.1312 | 0.8025 | 0.6648 | 0.8649 | 0.2838 | 0.0000 |
201
+
202
+
203
+ ### Framework versions
204
+
205
+ - Transformers 4.41.1
206
+ - Pytorch 2.3.0+cu121
207
+ - Datasets 2.19.1
208
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 144.0,
3
+ "eval_accuracy": 0.2903114186851211,
4
+ "eval_f1_macro": 0.661445626763805,
5
+ "eval_f1_micro": 0.800865984633018,
6
+ "eval_loss": 0.13202470541000366,
7
+ "eval_roc_auc": 0.8648958366235343,
8
+ "eval_runtime": 442.1525,
9
+ "eval_samples_per_second": 6.536,
10
+ "eval_steps_per_second": 0.206,
11
+ "learning_rate": 1.0000000000000002e-06,
12
+ "total_flos": 1.3598709030716368e+20,
13
+ "train_loss": 0.157796386979584,
14
+ "train_runtime": 249885.5342,
15
+ "train_samples_per_second": 5.232,
16
+ "train_steps_per_second": 0.164
17
+ }
logs/events.out.tfevents.1725102544.datavisu2 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8a502b44dc5c38f0481b7c51a5f63081de212a2cc151d6cd03a98a3c1579e78e
3
- size 100626
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c78f8d588e03c07136f112bd0733c82be709d721e766adc23738234bdd650852
3
+ size 102137
logs/events.out.tfevents.1725352882.datavisu2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02294ba166290ffcd374d6f7b2c0739495a8f0636b72b1daedb5427d8a3377fc
3
+ size 40
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6016b6b6580c578c26e6749db1b39e3b0539b8204d536f6c7e2effbb92da8134
3
  size 89762948
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab1f92930801f6ecde589d4e84c8913d6ecae132a9ee667fe3fdfbe078bdafb3
3
  size 89762948
test_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 144.0,
3
+ "eval_accuracy": 0.2903114186851211,
4
+ "eval_f1_macro": 0.661445626763805,
5
+ "eval_f1_micro": 0.800865984633018,
6
+ "eval_loss": 0.13202470541000366,
7
+ "eval_roc_auc": 0.8648958366235343,
8
+ "eval_runtime": 442.1525,
9
+ "eval_samples_per_second": 6.536,
10
+ "eval_steps_per_second": 0.206,
11
+ "learning_rate": 1.0000000000000002e-06
12
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 144.0,
3
+ "learning_rate": 1.0000000000000002e-06,
4
+ "total_flos": 1.3598709030716368e+20,
5
+ "train_loss": 0.157796386979584,
6
+ "train_runtime": 249885.5342,
7
+ "train_samples_per_second": 5.232,
8
+ "train_steps_per_second": 0.164
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,2470 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.1308571696281433,
3
+ "best_model_checkpoint": "/home/datawork-iot-nos/Seatizen/models/multilabel/fine_scale/DinoVdeau-small-2024_08_31-batch-size32_epochs150_freeze/checkpoint-36582",
4
+ "epoch": 144.0,
5
+ "eval_steps": 500,
6
+ "global_step": 39312,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 1.0,
13
+ "eval_accuracy": 0.19057519057519057,
14
+ "eval_f1_macro": 0.4058921954514261,
15
+ "eval_f1_micro": 0.7088941673264713,
16
+ "eval_loss": 0.19568666815757751,
17
+ "eval_roc_auc": 0.8060676064167129,
18
+ "eval_runtime": 426.0483,
19
+ "eval_samples_per_second": 6.774,
20
+ "eval_steps_per_second": 0.214,
21
+ "learning_rate": 0.001,
22
+ "step": 273
23
+ },
24
+ {
25
+ "epoch": 1.8315018315018317,
26
+ "grad_norm": 0.30737248063087463,
27
+ "learning_rate": 0.001,
28
+ "loss": 0.3189,
29
+ "step": 500
30
+ },
31
+ {
32
+ "epoch": 2.0,
33
+ "eval_accuracy": 0.21933471933471935,
34
+ "eval_f1_macro": 0.4867943512801917,
35
+ "eval_f1_micro": 0.738139514768845,
36
+ "eval_loss": 0.17198018729686737,
37
+ "eval_roc_auc": 0.8255075095586444,
38
+ "eval_runtime": 425.0166,
39
+ "eval_samples_per_second": 6.79,
40
+ "eval_steps_per_second": 0.214,
41
+ "learning_rate": 0.001,
42
+ "step": 546
43
+ },
44
+ {
45
+ "epoch": 3.0,
46
+ "eval_accuracy": 0.23215523215523215,
47
+ "eval_f1_macro": 0.5587016500092944,
48
+ "eval_f1_micro": 0.7578947368421052,
49
+ "eval_loss": 0.16209888458251953,
50
+ "eval_roc_auc": 0.8387630797560628,
51
+ "eval_runtime": 425.9119,
52
+ "eval_samples_per_second": 6.776,
53
+ "eval_steps_per_second": 0.214,
54
+ "learning_rate": 0.001,
55
+ "step": 819
56
+ },
57
+ {
58
+ "epoch": 3.663003663003663,
59
+ "grad_norm": 0.2619726359844208,
60
+ "learning_rate": 0.001,
61
+ "loss": 0.1897,
62
+ "step": 1000
63
+ },
64
+ {
65
+ "epoch": 4.0,
66
+ "eval_accuracy": 0.22487872487872487,
67
+ "eval_f1_macro": 0.5561953540051209,
68
+ "eval_f1_micro": 0.7463059684835497,
69
+ "eval_loss": 0.15948981046676636,
70
+ "eval_roc_auc": 0.8221271753092407,
71
+ "eval_runtime": 423.9484,
72
+ "eval_samples_per_second": 6.807,
73
+ "eval_steps_per_second": 0.215,
74
+ "learning_rate": 0.001,
75
+ "step": 1092
76
+ },
77
+ {
78
+ "epoch": 5.0,
79
+ "eval_accuracy": 0.23146223146223147,
80
+ "eval_f1_macro": 0.5723046956548954,
81
+ "eval_f1_micro": 0.7510718113612004,
82
+ "eval_loss": 0.15691693127155304,
83
+ "eval_roc_auc": 0.8244935635420478,
84
+ "eval_runtime": 423.6041,
85
+ "eval_samples_per_second": 6.813,
86
+ "eval_steps_per_second": 0.215,
87
+ "learning_rate": 0.001,
88
+ "step": 1365
89
+ },
90
+ {
91
+ "epoch": 5.4945054945054945,
92
+ "grad_norm": 0.17114631831645966,
93
+ "learning_rate": 0.001,
94
+ "loss": 0.1808,
95
+ "step": 1500
96
+ },
97
+ {
98
+ "epoch": 6.0,
99
+ "eval_accuracy": 0.2363132363132363,
100
+ "eval_f1_macro": 0.5786669115862841,
101
+ "eval_f1_micro": 0.7634727923836142,
102
+ "eval_loss": 0.15302371978759766,
103
+ "eval_roc_auc": 0.8365257318814997,
104
+ "eval_runtime": 427.5566,
105
+ "eval_samples_per_second": 6.75,
106
+ "eval_steps_per_second": 0.213,
107
+ "learning_rate": 0.001,
108
+ "step": 1638
109
+ },
110
+ {
111
+ "epoch": 7.0,
112
+ "eval_accuracy": 0.23354123354123354,
113
+ "eval_f1_macro": 0.5981729145672101,
114
+ "eval_f1_micro": 0.7651630269613162,
115
+ "eval_loss": 0.1523299366235733,
116
+ "eval_roc_auc": 0.838924594824006,
117
+ "eval_runtime": 430.1478,
118
+ "eval_samples_per_second": 6.709,
119
+ "eval_steps_per_second": 0.212,
120
+ "learning_rate": 0.001,
121
+ "step": 1911
122
+ },
123
+ {
124
+ "epoch": 7.326007326007326,
125
+ "grad_norm": 0.22214488685131073,
126
+ "learning_rate": 0.001,
127
+ "loss": 0.1763,
128
+ "step": 2000
129
+ },
130
+ {
131
+ "epoch": 8.0,
132
+ "eval_accuracy": 0.24185724185724186,
133
+ "eval_f1_macro": 0.587992292024695,
134
+ "eval_f1_micro": 0.7655172413793103,
135
+ "eval_loss": 0.15311872959136963,
136
+ "eval_roc_auc": 0.837740052624858,
137
+ "eval_runtime": 427.9308,
138
+ "eval_samples_per_second": 6.744,
139
+ "eval_steps_per_second": 0.213,
140
+ "learning_rate": 0.001,
141
+ "step": 2184
142
+ },
143
+ {
144
+ "epoch": 9.0,
145
+ "eval_accuracy": 0.24012474012474014,
146
+ "eval_f1_macro": 0.606908576330327,
147
+ "eval_f1_micro": 0.7699542669773061,
148
+ "eval_loss": 0.14992575347423553,
149
+ "eval_roc_auc": 0.8431046707780733,
150
+ "eval_runtime": 424.0382,
151
+ "eval_samples_per_second": 6.806,
152
+ "eval_steps_per_second": 0.215,
153
+ "learning_rate": 0.001,
154
+ "step": 2457
155
+ },
156
+ {
157
+ "epoch": 9.157509157509157,
158
+ "grad_norm": 0.1733015924692154,
159
+ "learning_rate": 0.001,
160
+ "loss": 0.1735,
161
+ "step": 2500
162
+ },
163
+ {
164
+ "epoch": 10.0,
165
+ "eval_accuracy": 0.24393624393624394,
166
+ "eval_f1_macro": 0.5829080312220596,
167
+ "eval_f1_micro": 0.7606115107913669,
168
+ "eval_loss": 0.1509619951248169,
169
+ "eval_roc_auc": 0.8277441062627229,
170
+ "eval_runtime": 424.8811,
171
+ "eval_samples_per_second": 6.792,
172
+ "eval_steps_per_second": 0.214,
173
+ "learning_rate": 0.001,
174
+ "step": 2730
175
+ },
176
+ {
177
+ "epoch": 10.989010989010989,
178
+ "grad_norm": 0.16356830298900604,
179
+ "learning_rate": 0.001,
180
+ "loss": 0.1723,
181
+ "step": 3000
182
+ },
183
+ {
184
+ "epoch": 11.0,
185
+ "eval_accuracy": 0.2505197505197505,
186
+ "eval_f1_macro": 0.5976223089766404,
187
+ "eval_f1_micro": 0.7689559002963221,
188
+ "eval_loss": 0.1520717293024063,
189
+ "eval_roc_auc": 0.8399853012032679,
190
+ "eval_runtime": 434.5331,
191
+ "eval_samples_per_second": 6.642,
192
+ "eval_steps_per_second": 0.209,
193
+ "learning_rate": 0.001,
194
+ "step": 3003
195
+ },
196
+ {
197
+ "epoch": 12.0,
198
+ "eval_accuracy": 0.2442827442827443,
199
+ "eval_f1_macro": 0.607405900640871,
200
+ "eval_f1_micro": 0.7759986516096409,
201
+ "eval_loss": 0.15027731657028198,
202
+ "eval_roc_auc": 0.8526551998703694,
203
+ "eval_runtime": 434.0545,
204
+ "eval_samples_per_second": 6.649,
205
+ "eval_steps_per_second": 0.21,
206
+ "learning_rate": 0.001,
207
+ "step": 3276
208
+ },
209
+ {
210
+ "epoch": 12.820512820512821,
211
+ "grad_norm": 0.1642971783876419,
212
+ "learning_rate": 0.001,
213
+ "loss": 0.1719,
214
+ "step": 3500
215
+ },
216
+ {
217
+ "epoch": 13.0,
218
+ "eval_accuracy": 0.24393624393624394,
219
+ "eval_f1_macro": 0.6003271512523337,
220
+ "eval_f1_micro": 0.7623558852444365,
221
+ "eval_loss": 0.1504218876361847,
222
+ "eval_roc_auc": 0.8301696089299148,
223
+ "eval_runtime": 426.4716,
224
+ "eval_samples_per_second": 6.767,
225
+ "eval_steps_per_second": 0.213,
226
+ "learning_rate": 0.001,
227
+ "step": 3549
228
+ },
229
+ {
230
+ "epoch": 14.0,
231
+ "eval_accuracy": 0.24462924462924462,
232
+ "eval_f1_macro": 0.602811285040826,
233
+ "eval_f1_micro": 0.7644358114073813,
234
+ "eval_loss": 0.1496724784374237,
235
+ "eval_roc_auc": 0.8342951177137805,
236
+ "eval_runtime": 428.909,
237
+ "eval_samples_per_second": 6.729,
238
+ "eval_steps_per_second": 0.212,
239
+ "learning_rate": 0.001,
240
+ "step": 3822
241
+ },
242
+ {
243
+ "epoch": 14.652014652014651,
244
+ "grad_norm": 0.1759812980890274,
245
+ "learning_rate": 0.001,
246
+ "loss": 0.1702,
247
+ "step": 4000
248
+ },
249
+ {
250
+ "epoch": 15.0,
251
+ "eval_accuracy": 0.2512127512127512,
252
+ "eval_f1_macro": 0.6066013767027806,
253
+ "eval_f1_micro": 0.7751615281210703,
254
+ "eval_loss": 0.14749661087989807,
255
+ "eval_roc_auc": 0.8445581856657356,
256
+ "eval_runtime": 424.6732,
257
+ "eval_samples_per_second": 6.796,
258
+ "eval_steps_per_second": 0.214,
259
+ "learning_rate": 0.001,
260
+ "step": 4095
261
+ },
262
+ {
263
+ "epoch": 16.0,
264
+ "eval_accuracy": 0.24636174636174638,
265
+ "eval_f1_macro": 0.5838354990739413,
266
+ "eval_f1_micro": 0.7645565108923241,
267
+ "eval_loss": 0.14998775720596313,
268
+ "eval_roc_auc": 0.8320747114163963,
269
+ "eval_runtime": 423.7704,
270
+ "eval_samples_per_second": 6.81,
271
+ "eval_steps_per_second": 0.215,
272
+ "learning_rate": 0.001,
273
+ "step": 4368
274
+ },
275
+ {
276
+ "epoch": 16.483516483516482,
277
+ "grad_norm": 0.14804692566394806,
278
+ "learning_rate": 0.001,
279
+ "loss": 0.1696,
280
+ "step": 4500
281
+ },
282
+ {
283
+ "epoch": 17.0,
284
+ "eval_accuracy": 0.24566874566874566,
285
+ "eval_f1_macro": 0.6073459016890155,
286
+ "eval_f1_micro": 0.7719883641341547,
287
+ "eval_loss": 0.15297245979309082,
288
+ "eval_roc_auc": 0.8464322218871764,
289
+ "eval_runtime": 424.9885,
290
+ "eval_samples_per_second": 6.791,
291
+ "eval_steps_per_second": 0.214,
292
+ "learning_rate": 0.001,
293
+ "step": 4641
294
+ },
295
+ {
296
+ "epoch": 18.0,
297
+ "eval_accuracy": 0.24393624393624394,
298
+ "eval_f1_macro": 0.614324753279198,
299
+ "eval_f1_micro": 0.7751951282271207,
300
+ "eval_loss": 0.14907290041446686,
301
+ "eval_roc_auc": 0.8475019020709771,
302
+ "eval_runtime": 420.1647,
303
+ "eval_samples_per_second": 6.869,
304
+ "eval_steps_per_second": 0.217,
305
+ "learning_rate": 0.001,
306
+ "step": 4914
307
+ },
308
+ {
309
+ "epoch": 18.315018315018314,
310
+ "grad_norm": 0.19223743677139282,
311
+ "learning_rate": 0.001,
312
+ "loss": 0.1717,
313
+ "step": 5000
314
+ },
315
+ {
316
+ "epoch": 19.0,
317
+ "eval_accuracy": 0.23458073458073458,
318
+ "eval_f1_macro": 0.6075499214740471,
319
+ "eval_f1_micro": 0.7739734788726388,
320
+ "eval_loss": 0.14951026439666748,
321
+ "eval_roc_auc": 0.848377592477135,
322
+ "eval_runtime": 427.9682,
323
+ "eval_samples_per_second": 6.743,
324
+ "eval_steps_per_second": 0.213,
325
+ "learning_rate": 0.001,
326
+ "step": 5187
327
+ },
328
+ {
329
+ "epoch": 20.0,
330
+ "eval_accuracy": 0.24532224532224534,
331
+ "eval_f1_macro": 0.595638442008225,
332
+ "eval_f1_micro": 0.7636993911381718,
333
+ "eval_loss": 0.14873762428760529,
334
+ "eval_roc_auc": 0.8322311292560515,
335
+ "eval_runtime": 421.5059,
336
+ "eval_samples_per_second": 6.847,
337
+ "eval_steps_per_second": 0.216,
338
+ "learning_rate": 0.001,
339
+ "step": 5460
340
+ },
341
+ {
342
+ "epoch": 20.146520146520146,
343
+ "grad_norm": 0.15787707269191742,
344
+ "learning_rate": 0.001,
345
+ "loss": 0.1705,
346
+ "step": 5500
347
+ },
348
+ {
349
+ "epoch": 21.0,
350
+ "eval_accuracy": 0.24740124740124741,
351
+ "eval_f1_macro": 0.6164990545073296,
352
+ "eval_f1_micro": 0.780452718426063,
353
+ "eval_loss": 0.14705629646778107,
354
+ "eval_roc_auc": 0.8539786012990958,
355
+ "eval_runtime": 429.6596,
356
+ "eval_samples_per_second": 6.717,
357
+ "eval_steps_per_second": 0.212,
358
+ "learning_rate": 0.001,
359
+ "step": 5733
360
+ },
361
+ {
362
+ "epoch": 21.978021978021978,
363
+ "grad_norm": 0.15392103791236877,
364
+ "learning_rate": 0.001,
365
+ "loss": 0.1706,
366
+ "step": 6000
367
+ },
368
+ {
369
+ "epoch": 22.0,
370
+ "eval_accuracy": 0.24532224532224534,
371
+ "eval_f1_macro": 0.6073576225776433,
372
+ "eval_f1_micro": 0.7753641707130079,
373
+ "eval_loss": 0.1508719027042389,
374
+ "eval_roc_auc": 0.8494150259851333,
375
+ "eval_runtime": 429.7216,
376
+ "eval_samples_per_second": 6.716,
377
+ "eval_steps_per_second": 0.212,
378
+ "learning_rate": 0.001,
379
+ "step": 6006
380
+ },
381
+ {
382
+ "epoch": 23.0,
383
+ "eval_accuracy": 0.2428967428967429,
384
+ "eval_f1_macro": 0.6127152502703448,
385
+ "eval_f1_micro": 0.771920553133395,
386
+ "eval_loss": 0.15015815198421478,
387
+ "eval_roc_auc": 0.8388299205154317,
388
+ "eval_runtime": 426.6602,
389
+ "eval_samples_per_second": 6.764,
390
+ "eval_steps_per_second": 0.213,
391
+ "learning_rate": 0.001,
392
+ "step": 6279
393
+ },
394
+ {
395
+ "epoch": 23.80952380952381,
396
+ "grad_norm": 0.1737624853849411,
397
+ "learning_rate": 0.001,
398
+ "loss": 0.1699,
399
+ "step": 6500
400
+ },
401
+ {
402
+ "epoch": 24.0,
403
+ "eval_accuracy": 0.24012474012474014,
404
+ "eval_f1_macro": 0.5849380548549015,
405
+ "eval_f1_micro": 0.7698941591532732,
406
+ "eval_loss": 0.14965225756168365,
407
+ "eval_roc_auc": 0.8406060899537385,
408
+ "eval_runtime": 430.4521,
409
+ "eval_samples_per_second": 6.705,
410
+ "eval_steps_per_second": 0.211,
411
+ "learning_rate": 0.001,
412
+ "step": 6552
413
+ },
414
+ {
415
+ "epoch": 25.0,
416
+ "eval_accuracy": 0.24255024255024255,
417
+ "eval_f1_macro": 0.6035289549510865,
418
+ "eval_f1_micro": 0.7761348897535668,
419
+ "eval_loss": 0.14702074229717255,
420
+ "eval_roc_auc": 0.8458632504863829,
421
+ "eval_runtime": 428.0693,
422
+ "eval_samples_per_second": 6.742,
423
+ "eval_steps_per_second": 0.213,
424
+ "learning_rate": 0.001,
425
+ "step": 6825
426
+ },
427
+ {
428
+ "epoch": 25.641025641025642,
429
+ "grad_norm": 0.1737377792596817,
430
+ "learning_rate": 0.001,
431
+ "loss": 0.1694,
432
+ "step": 7000
433
+ },
434
+ {
435
+ "epoch": 26.0,
436
+ "eval_accuracy": 0.24220374220374222,
437
+ "eval_f1_macro": 0.6064603919289959,
438
+ "eval_f1_micro": 0.7751430907604253,
439
+ "eval_loss": 0.14808295667171478,
440
+ "eval_roc_auc": 0.8465518457868458,
441
+ "eval_runtime": 438.4341,
442
+ "eval_samples_per_second": 6.583,
443
+ "eval_steps_per_second": 0.208,
444
+ "learning_rate": 0.001,
445
+ "step": 7098
446
+ },
447
+ {
448
+ "epoch": 27.0,
449
+ "eval_accuracy": 0.24740124740124741,
450
+ "eval_f1_macro": 0.6135774018658996,
451
+ "eval_f1_micro": 0.7689308343302761,
452
+ "eval_loss": 0.14581289887428284,
453
+ "eval_roc_auc": 0.8357120666953542,
454
+ "eval_runtime": 426.6923,
455
+ "eval_samples_per_second": 6.764,
456
+ "eval_steps_per_second": 0.213,
457
+ "learning_rate": 0.001,
458
+ "step": 7371
459
+ },
460
+ {
461
+ "epoch": 27.47252747252747,
462
+ "grad_norm": 0.16500511765480042,
463
+ "learning_rate": 0.001,
464
+ "loss": 0.17,
465
+ "step": 7500
466
+ },
467
+ {
468
+ "epoch": 28.0,
469
+ "eval_accuracy": 0.24462924462924462,
470
+ "eval_f1_macro": 0.6077297645661711,
471
+ "eval_f1_micro": 0.7751325049960902,
472
+ "eval_loss": 0.1453842669725418,
473
+ "eval_roc_auc": 0.8440532649625113,
474
+ "eval_runtime": 431.4145,
475
+ "eval_samples_per_second": 6.69,
476
+ "eval_steps_per_second": 0.211,
477
+ "learning_rate": 0.001,
478
+ "step": 7644
479
+ },
480
+ {
481
+ "epoch": 29.0,
482
+ "eval_accuracy": 0.24566874566874566,
483
+ "eval_f1_macro": 0.6107922701154117,
484
+ "eval_f1_micro": 0.7735191637630662,
485
+ "eval_loss": 0.14941243827342987,
486
+ "eval_roc_auc": 0.849050708300112,
487
+ "eval_runtime": 434.9588,
488
+ "eval_samples_per_second": 6.635,
489
+ "eval_steps_per_second": 0.209,
490
+ "learning_rate": 0.001,
491
+ "step": 7917
492
+ },
493
+ {
494
+ "epoch": 29.304029304029303,
495
+ "grad_norm": 0.1599486619234085,
496
+ "learning_rate": 0.001,
497
+ "loss": 0.1685,
498
+ "step": 8000
499
+ },
500
+ {
501
+ "epoch": 30.0,
502
+ "eval_accuracy": 0.24982674982674982,
503
+ "eval_f1_macro": 0.5982833860845571,
504
+ "eval_f1_micro": 0.7705324709843182,
505
+ "eval_loss": 0.14549985527992249,
506
+ "eval_roc_auc": 0.8366026732011344,
507
+ "eval_runtime": 434.3329,
508
+ "eval_samples_per_second": 6.645,
509
+ "eval_steps_per_second": 0.21,
510
+ "learning_rate": 0.001,
511
+ "step": 8190
512
+ },
513
+ {
514
+ "epoch": 31.0,
515
+ "eval_accuracy": 0.2532917532917533,
516
+ "eval_f1_macro": 0.6068619458731248,
517
+ "eval_f1_micro": 0.7784728768532008,
518
+ "eval_loss": 0.14541107416152954,
519
+ "eval_roc_auc": 0.8494949988142239,
520
+ "eval_runtime": 435.6219,
521
+ "eval_samples_per_second": 6.625,
522
+ "eval_steps_per_second": 0.209,
523
+ "learning_rate": 0.001,
524
+ "step": 8463
525
+ },
526
+ {
527
+ "epoch": 31.135531135531135,
528
+ "grad_norm": 0.1950293928384781,
529
+ "learning_rate": 0.001,
530
+ "loss": 0.1687,
531
+ "step": 8500
532
+ },
533
+ {
534
+ "epoch": 32.0,
535
+ "eval_accuracy": 0.24532224532224534,
536
+ "eval_f1_macro": 0.6145316287096297,
537
+ "eval_f1_micro": 0.7746102833519939,
538
+ "eval_loss": 0.14657220244407654,
539
+ "eval_roc_auc": 0.8460955499587395,
540
+ "eval_runtime": 434.8949,
541
+ "eval_samples_per_second": 6.636,
542
+ "eval_steps_per_second": 0.209,
543
+ "learning_rate": 0.001,
544
+ "step": 8736
545
+ },
546
+ {
547
+ "epoch": 32.967032967032964,
548
+ "grad_norm": 0.18405263125896454,
549
+ "learning_rate": 0.001,
550
+ "loss": 0.1679,
551
+ "step": 9000
552
+ },
553
+ {
554
+ "epoch": 33.0,
555
+ "eval_accuracy": 0.253984753984754,
556
+ "eval_f1_macro": 0.6124691593400795,
557
+ "eval_f1_micro": 0.777031154551008,
558
+ "eval_loss": 0.14459234476089478,
559
+ "eval_roc_auc": 0.843919167617255,
560
+ "eval_runtime": 440.1591,
561
+ "eval_samples_per_second": 6.557,
562
+ "eval_steps_per_second": 0.207,
563
+ "learning_rate": 0.001,
564
+ "step": 9009
565
+ },
566
+ {
567
+ "epoch": 34.0,
568
+ "eval_accuracy": 0.24462924462924462,
569
+ "eval_f1_macro": 0.6168054796129936,
570
+ "eval_f1_micro": 0.7781283769180896,
571
+ "eval_loss": 0.1468168944120407,
572
+ "eval_roc_auc": 0.8469846407097918,
573
+ "eval_runtime": 438.6105,
574
+ "eval_samples_per_second": 6.58,
575
+ "eval_steps_per_second": 0.207,
576
+ "learning_rate": 0.001,
577
+ "step": 9282
578
+ },
579
+ {
580
+ "epoch": 34.798534798534796,
581
+ "grad_norm": 0.17146140336990356,
582
+ "learning_rate": 0.001,
583
+ "loss": 0.168,
584
+ "step": 9500
585
+ },
586
+ {
587
+ "epoch": 35.0,
588
+ "eval_accuracy": 0.2494802494802495,
589
+ "eval_f1_macro": 0.6193343400891848,
590
+ "eval_f1_micro": 0.7766880749869814,
591
+ "eval_loss": 0.14858707785606384,
592
+ "eval_roc_auc": 0.8451765062846143,
593
+ "eval_runtime": 434.5802,
594
+ "eval_samples_per_second": 6.641,
595
+ "eval_steps_per_second": 0.209,
596
+ "learning_rate": 0.001,
597
+ "step": 9555
598
+ },
599
+ {
600
+ "epoch": 36.0,
601
+ "eval_accuracy": 0.24878724878724878,
602
+ "eval_f1_macro": 0.6092667253949349,
603
+ "eval_f1_micro": 0.7718835224773468,
604
+ "eval_loss": 0.14637114107608795,
605
+ "eval_roc_auc": 0.8391158347811251,
606
+ "eval_runtime": 439.3197,
607
+ "eval_samples_per_second": 6.569,
608
+ "eval_steps_per_second": 0.207,
609
+ "learning_rate": 0.001,
610
+ "step": 9828
611
+ },
612
+ {
613
+ "epoch": 36.63003663003663,
614
+ "grad_norm": 0.16876503825187683,
615
+ "learning_rate": 0.001,
616
+ "loss": 0.169,
617
+ "step": 10000
618
+ },
619
+ {
620
+ "epoch": 37.0,
621
+ "eval_accuracy": 0.24982674982674982,
622
+ "eval_f1_macro": 0.6127183895875491,
623
+ "eval_f1_micro": 0.7733602776435442,
624
+ "eval_loss": 0.1448281705379486,
625
+ "eval_roc_auc": 0.8402195590843876,
626
+ "eval_runtime": 437.3035,
627
+ "eval_samples_per_second": 6.6,
628
+ "eval_steps_per_second": 0.208,
629
+ "learning_rate": 0.001,
630
+ "step": 10101
631
+ },
632
+ {
633
+ "epoch": 38.0,
634
+ "eval_accuracy": 0.25225225225225223,
635
+ "eval_f1_macro": 0.6109962510638844,
636
+ "eval_f1_micro": 0.7814896880859042,
637
+ "eval_loss": 0.1450735628604889,
638
+ "eval_roc_auc": 0.8526187412743501,
639
+ "eval_runtime": 437.7229,
640
+ "eval_samples_per_second": 6.593,
641
+ "eval_steps_per_second": 0.208,
642
+ "learning_rate": 0.001,
643
+ "step": 10374
644
+ },
645
+ {
646
+ "epoch": 38.46153846153846,
647
+ "grad_norm": 0.19475676119327545,
648
+ "learning_rate": 0.001,
649
+ "loss": 0.167,
650
+ "step": 10500
651
+ },
652
+ {
653
+ "epoch": 39.0,
654
+ "eval_accuracy": 0.24982674982674982,
655
+ "eval_f1_macro": 0.6272196317832909,
656
+ "eval_f1_micro": 0.7824146207942057,
657
+ "eval_loss": 0.14469724893569946,
658
+ "eval_roc_auc": 0.8563424677452759,
659
+ "eval_runtime": 435.4486,
660
+ "eval_samples_per_second": 6.628,
661
+ "eval_steps_per_second": 0.209,
662
+ "learning_rate": 0.001,
663
+ "step": 10647
664
+ },
665
+ {
666
+ "epoch": 40.0,
667
+ "eval_accuracy": 0.25363825363825365,
668
+ "eval_f1_macro": 0.6265963634718456,
669
+ "eval_f1_micro": 0.7836651178652115,
670
+ "eval_loss": 0.14824891090393066,
671
+ "eval_roc_auc": 0.853692740688437,
672
+ "eval_runtime": 435.8824,
673
+ "eval_samples_per_second": 6.621,
674
+ "eval_steps_per_second": 0.209,
675
+ "learning_rate": 0.0001,
676
+ "step": 10920
677
+ },
678
+ {
679
+ "epoch": 40.29304029304029,
680
+ "grad_norm": 0.15533967316150665,
681
+ "learning_rate": 0.0001,
682
+ "loss": 0.1652,
683
+ "step": 11000
684
+ },
685
+ {
686
+ "epoch": 41.0,
687
+ "eval_accuracy": 0.2616077616077616,
688
+ "eval_f1_macro": 0.6323784470247855,
689
+ "eval_f1_micro": 0.7833456473553827,
690
+ "eval_loss": 0.14141727983951569,
691
+ "eval_roc_auc": 0.8483120796798727,
692
+ "eval_runtime": 435.7344,
693
+ "eval_samples_per_second": 6.623,
694
+ "eval_steps_per_second": 0.209,
695
+ "learning_rate": 0.0001,
696
+ "step": 11193
697
+ },
698
+ {
699
+ "epoch": 42.0,
700
+ "eval_accuracy": 0.26195426195426197,
701
+ "eval_f1_macro": 0.6371841233046203,
702
+ "eval_f1_micro": 0.7884351407000686,
703
+ "eval_loss": 0.13979895412921906,
704
+ "eval_roc_auc": 0.8545567611245666,
705
+ "eval_runtime": 438.4508,
706
+ "eval_samples_per_second": 6.582,
707
+ "eval_steps_per_second": 0.208,
708
+ "learning_rate": 0.0001,
709
+ "step": 11466
710
+ },
711
+ {
712
+ "epoch": 42.124542124542124,
713
+ "grad_norm": 0.1733330935239792,
714
+ "learning_rate": 0.0001,
715
+ "loss": 0.1608,
716
+ "step": 11500
717
+ },
718
+ {
719
+ "epoch": 43.0,
720
+ "eval_accuracy": 0.26403326403326405,
721
+ "eval_f1_macro": 0.6366820358518588,
722
+ "eval_f1_micro": 0.7871061893724783,
723
+ "eval_loss": 0.14107641577720642,
724
+ "eval_roc_auc": 0.853678548931782,
725
+ "eval_runtime": 434.1211,
726
+ "eval_samples_per_second": 6.648,
727
+ "eval_steps_per_second": 0.21,
728
+ "learning_rate": 0.0001,
729
+ "step": 11739
730
+ },
731
+ {
732
+ "epoch": 43.956043956043956,
733
+ "grad_norm": 0.19694675505161285,
734
+ "learning_rate": 0.0001,
735
+ "loss": 0.1596,
736
+ "step": 12000
737
+ },
738
+ {
739
+ "epoch": 44.0,
740
+ "eval_accuracy": 0.26126126126126126,
741
+ "eval_f1_macro": 0.6256922069455233,
742
+ "eval_f1_micro": 0.787878787878788,
743
+ "eval_loss": 0.13898694515228271,
744
+ "eval_roc_auc": 0.8537086091649239,
745
+ "eval_runtime": 434.0073,
746
+ "eval_samples_per_second": 6.65,
747
+ "eval_steps_per_second": 0.21,
748
+ "learning_rate": 0.0001,
749
+ "step": 12012
750
+ },
751
+ {
752
+ "epoch": 45.0,
753
+ "eval_accuracy": 0.2664587664587665,
754
+ "eval_f1_macro": 0.6421056073559387,
755
+ "eval_f1_micro": 0.7894011202068074,
756
+ "eval_loss": 0.13859130442142487,
757
+ "eval_roc_auc": 0.8538817942028954,
758
+ "eval_runtime": 432.4865,
759
+ "eval_samples_per_second": 6.673,
760
+ "eval_steps_per_second": 0.21,
761
+ "learning_rate": 0.0001,
762
+ "step": 12285
763
+ },
764
+ {
765
+ "epoch": 45.78754578754579,
766
+ "grad_norm": 0.18810147047042847,
767
+ "learning_rate": 0.0001,
768
+ "loss": 0.1582,
769
+ "step": 12500
770
+ },
771
+ {
772
+ "epoch": 46.0,
773
+ "eval_accuracy": 0.2664587664587665,
774
+ "eval_f1_macro": 0.6283048537279357,
775
+ "eval_f1_micro": 0.7873893327575039,
776
+ "eval_loss": 0.139601469039917,
777
+ "eval_roc_auc": 0.8521625527563127,
778
+ "eval_runtime": 421.9429,
779
+ "eval_samples_per_second": 6.84,
780
+ "eval_steps_per_second": 0.216,
781
+ "learning_rate": 0.0001,
782
+ "step": 12558
783
+ },
784
+ {
785
+ "epoch": 47.0,
786
+ "eval_accuracy": 0.2636867636867637,
787
+ "eval_f1_macro": 0.6286555138094179,
788
+ "eval_f1_micro": 0.7863567238757333,
789
+ "eval_loss": 0.13869330286979675,
790
+ "eval_roc_auc": 0.8499808451526433,
791
+ "eval_runtime": 424.0306,
792
+ "eval_samples_per_second": 6.806,
793
+ "eval_steps_per_second": 0.215,
794
+ "learning_rate": 0.0001,
795
+ "step": 12831
796
+ },
797
+ {
798
+ "epoch": 47.61904761904762,
799
+ "grad_norm": 0.15351006388664246,
800
+ "learning_rate": 0.0001,
801
+ "loss": 0.1584,
802
+ "step": 13000
803
+ },
804
+ {
805
+ "epoch": 48.0,
806
+ "eval_accuracy": 0.26784476784476785,
807
+ "eval_f1_macro": 0.6334934953582803,
808
+ "eval_f1_micro": 0.7913177234660741,
809
+ "eval_loss": 0.13777127861976624,
810
+ "eval_roc_auc": 0.8571892112602602,
811
+ "eval_runtime": 419.9652,
812
+ "eval_samples_per_second": 6.872,
813
+ "eval_steps_per_second": 0.217,
814
+ "learning_rate": 0.0001,
815
+ "step": 13104
816
+ },
817
+ {
818
+ "epoch": 49.0,
819
+ "eval_accuracy": 0.26403326403326405,
820
+ "eval_f1_macro": 0.6381777921693204,
821
+ "eval_f1_micro": 0.7933989479042932,
822
+ "eval_loss": 0.1377096027135849,
823
+ "eval_roc_auc": 0.8602965218660363,
824
+ "eval_runtime": 431.2306,
825
+ "eval_samples_per_second": 6.692,
826
+ "eval_steps_per_second": 0.211,
827
+ "learning_rate": 0.0001,
828
+ "step": 13377
829
+ },
830
+ {
831
+ "epoch": 49.45054945054945,
832
+ "grad_norm": 0.1798904836177826,
833
+ "learning_rate": 0.0001,
834
+ "loss": 0.157,
835
+ "step": 13500
836
+ },
837
+ {
838
+ "epoch": 50.0,
839
+ "eval_accuracy": 0.2674982674982675,
840
+ "eval_f1_macro": 0.6362718007605523,
841
+ "eval_f1_micro": 0.7918342891380639,
842
+ "eval_loss": 0.13755330443382263,
843
+ "eval_roc_auc": 0.8570210161405075,
844
+ "eval_runtime": 429.5809,
845
+ "eval_samples_per_second": 6.718,
846
+ "eval_steps_per_second": 0.212,
847
+ "learning_rate": 0.0001,
848
+ "step": 13650
849
+ },
850
+ {
851
+ "epoch": 51.0,
852
+ "eval_accuracy": 0.2661122661122661,
853
+ "eval_f1_macro": 0.6426825970872383,
854
+ "eval_f1_micro": 0.7928808087673094,
855
+ "eval_loss": 0.13754987716674805,
856
+ "eval_roc_auc": 0.8596608706709776,
857
+ "eval_runtime": 429.3766,
858
+ "eval_samples_per_second": 6.721,
859
+ "eval_steps_per_second": 0.212,
860
+ "learning_rate": 0.0001,
861
+ "step": 13923
862
+ },
863
+ {
864
+ "epoch": 51.282051282051285,
865
+ "grad_norm": 0.20376506447792053,
866
+ "learning_rate": 0.0001,
867
+ "loss": 0.1567,
868
+ "step": 14000
869
+ },
870
+ {
871
+ "epoch": 52.0,
872
+ "eval_accuracy": 0.26576576576576577,
873
+ "eval_f1_macro": 0.6367912909960436,
874
+ "eval_f1_micro": 0.7871186146434616,
875
+ "eval_loss": 0.13771678507328033,
876
+ "eval_roc_auc": 0.8506886757830149,
877
+ "eval_runtime": 424.3804,
878
+ "eval_samples_per_second": 6.801,
879
+ "eval_steps_per_second": 0.214,
880
+ "learning_rate": 0.0001,
881
+ "step": 14196
882
+ },
883
+ {
884
+ "epoch": 53.0,
885
+ "eval_accuracy": 0.2692307692307692,
886
+ "eval_f1_macro": 0.640555047060403,
887
+ "eval_f1_micro": 0.7928592630284527,
888
+ "eval_loss": 0.13740690052509308,
889
+ "eval_roc_auc": 0.8601326459765699,
890
+ "eval_runtime": 434.4832,
891
+ "eval_samples_per_second": 6.642,
892
+ "eval_steps_per_second": 0.209,
893
+ "learning_rate": 0.0001,
894
+ "step": 14469
895
+ },
896
+ {
897
+ "epoch": 53.11355311355312,
898
+ "grad_norm": 0.16348811984062195,
899
+ "learning_rate": 0.0001,
900
+ "loss": 0.1571,
901
+ "step": 14500
902
+ },
903
+ {
904
+ "epoch": 54.0,
905
+ "eval_accuracy": 0.27165627165627165,
906
+ "eval_f1_macro": 0.6412320555565514,
907
+ "eval_f1_micro": 0.7920979171140219,
908
+ "eval_loss": 0.1368684023618698,
909
+ "eval_roc_auc": 0.8562094300869534,
910
+ "eval_runtime": 425.2932,
911
+ "eval_samples_per_second": 6.786,
912
+ "eval_steps_per_second": 0.214,
913
+ "learning_rate": 0.0001,
914
+ "step": 14742
915
+ },
916
+ {
917
+ "epoch": 54.94505494505494,
918
+ "grad_norm": 0.20431332290172577,
919
+ "learning_rate": 0.0001,
920
+ "loss": 0.1548,
921
+ "step": 15000
922
+ },
923
+ {
924
+ "epoch": 55.0,
925
+ "eval_accuracy": 0.2702702702702703,
926
+ "eval_f1_macro": 0.6377616721633446,
927
+ "eval_f1_micro": 0.7914089347079037,
928
+ "eval_loss": 0.13703426718711853,
929
+ "eval_roc_auc": 0.8557803910164303,
930
+ "eval_runtime": 424.9893,
931
+ "eval_samples_per_second": 6.791,
932
+ "eval_steps_per_second": 0.214,
933
+ "learning_rate": 0.0001,
934
+ "step": 15015
935
+ },
936
+ {
937
+ "epoch": 56.0,
938
+ "eval_accuracy": 0.2643797643797644,
939
+ "eval_f1_macro": 0.6425003998141597,
940
+ "eval_f1_micro": 0.7931107623128156,
941
+ "eval_loss": 0.1364637017250061,
942
+ "eval_roc_auc": 0.8601515459625123,
943
+ "eval_runtime": 423.7139,
944
+ "eval_samples_per_second": 6.811,
945
+ "eval_steps_per_second": 0.215,
946
+ "learning_rate": 0.0001,
947
+ "step": 15288
948
+ },
949
+ {
950
+ "epoch": 56.776556776556774,
951
+ "grad_norm": 0.19714656472206116,
952
+ "learning_rate": 0.0001,
953
+ "loss": 0.155,
954
+ "step": 15500
955
+ },
956
+ {
957
+ "epoch": 57.0,
958
+ "eval_accuracy": 0.2674982674982675,
959
+ "eval_f1_macro": 0.6381793578718891,
960
+ "eval_f1_micro": 0.7926408585665006,
961
+ "eval_loss": 0.13675515353679657,
962
+ "eval_roc_auc": 0.8588114846455387,
963
+ "eval_runtime": 426.4919,
964
+ "eval_samples_per_second": 6.767,
965
+ "eval_steps_per_second": 0.213,
966
+ "learning_rate": 0.0001,
967
+ "step": 15561
968
+ },
969
+ {
970
+ "epoch": 58.0,
971
+ "eval_accuracy": 0.2674982674982675,
972
+ "eval_f1_macro": 0.637380953089336,
973
+ "eval_f1_micro": 0.791562634524322,
974
+ "eval_loss": 0.1364695280790329,
975
+ "eval_roc_auc": 0.855274853280308,
976
+ "eval_runtime": 425.8426,
977
+ "eval_samples_per_second": 6.777,
978
+ "eval_steps_per_second": 0.214,
979
+ "learning_rate": 0.0001,
980
+ "step": 15834
981
+ },
982
+ {
983
+ "epoch": 58.608058608058606,
984
+ "grad_norm": 0.19042669236660004,
985
+ "learning_rate": 0.0001,
986
+ "loss": 0.155,
987
+ "step": 16000
988
+ },
989
+ {
990
+ "epoch": 59.0,
991
+ "eval_accuracy": 0.2674982674982675,
992
+ "eval_f1_macro": 0.6428884521567982,
993
+ "eval_f1_micro": 0.7922245108135942,
994
+ "eval_loss": 0.13641765713691711,
995
+ "eval_roc_auc": 0.8565012329926954,
996
+ "eval_runtime": 423.8693,
997
+ "eval_samples_per_second": 6.809,
998
+ "eval_steps_per_second": 0.215,
999
+ "learning_rate": 0.0001,
1000
+ "step": 16107
1001
+ },
1002
+ {
1003
+ "epoch": 60.0,
1004
+ "eval_accuracy": 0.26507276507276506,
1005
+ "eval_f1_macro": 0.6357999016219877,
1006
+ "eval_f1_micro": 0.7882888744307093,
1007
+ "eval_loss": 0.13687649369239807,
1008
+ "eval_roc_auc": 0.8514745744887481,
1009
+ "eval_runtime": 423.4928,
1010
+ "eval_samples_per_second": 6.815,
1011
+ "eval_steps_per_second": 0.215,
1012
+ "learning_rate": 0.0001,
1013
+ "step": 16380
1014
+ },
1015
+ {
1016
+ "epoch": 60.43956043956044,
1017
+ "grad_norm": 0.18568764626979828,
1018
+ "learning_rate": 0.0001,
1019
+ "loss": 0.1546,
1020
+ "step": 16500
1021
+ },
1022
+ {
1023
+ "epoch": 61.0,
1024
+ "eval_accuracy": 0.2713097713097713,
1025
+ "eval_f1_macro": 0.6503848519713329,
1026
+ "eval_f1_micro": 0.7945638702508654,
1027
+ "eval_loss": 0.13638463616371155,
1028
+ "eval_roc_auc": 0.8588833823919201,
1029
+ "eval_runtime": 425.9119,
1030
+ "eval_samples_per_second": 6.776,
1031
+ "eval_steps_per_second": 0.214,
1032
+ "learning_rate": 0.0001,
1033
+ "step": 16653
1034
+ },
1035
+ {
1036
+ "epoch": 62.0,
1037
+ "eval_accuracy": 0.2751212751212751,
1038
+ "eval_f1_macro": 0.6441767594174573,
1039
+ "eval_f1_micro": 0.7931640039405492,
1040
+ "eval_loss": 0.13563227653503418,
1041
+ "eval_roc_auc": 0.8575138778747027,
1042
+ "eval_runtime": 422.0661,
1043
+ "eval_samples_per_second": 6.838,
1044
+ "eval_steps_per_second": 0.216,
1045
+ "learning_rate": 0.0001,
1046
+ "step": 16926
1047
+ },
1048
+ {
1049
+ "epoch": 62.27106227106227,
1050
+ "grad_norm": 0.19402863085269928,
1051
+ "learning_rate": 0.0001,
1052
+ "loss": 0.1536,
1053
+ "step": 17000
1054
+ },
1055
+ {
1056
+ "epoch": 63.0,
1057
+ "eval_accuracy": 0.27373527373527373,
1058
+ "eval_f1_macro": 0.6515952055035917,
1059
+ "eval_f1_micro": 0.7966116124638174,
1060
+ "eval_loss": 0.1355270892381668,
1061
+ "eval_roc_auc": 0.8610939161629354,
1062
+ "eval_runtime": 426.9279,
1063
+ "eval_samples_per_second": 6.76,
1064
+ "eval_steps_per_second": 0.213,
1065
+ "learning_rate": 0.0001,
1066
+ "step": 17199
1067
+ },
1068
+ {
1069
+ "epoch": 64.0,
1070
+ "eval_accuracy": 0.26784476784476785,
1071
+ "eval_f1_macro": 0.6450040026439422,
1072
+ "eval_f1_micro": 0.7934075342465754,
1073
+ "eval_loss": 0.13592010736465454,
1074
+ "eval_roc_auc": 0.8577985580745997,
1075
+ "eval_runtime": 426.0816,
1076
+ "eval_samples_per_second": 6.773,
1077
+ "eval_steps_per_second": 0.214,
1078
+ "learning_rate": 0.0001,
1079
+ "step": 17472
1080
+ },
1081
+ {
1082
+ "epoch": 64.1025641025641,
1083
+ "grad_norm": 0.22000150382518768,
1084
+ "learning_rate": 0.0001,
1085
+ "loss": 0.1544,
1086
+ "step": 17500
1087
+ },
1088
+ {
1089
+ "epoch": 65.0,
1090
+ "eval_accuracy": 0.27061677061677064,
1091
+ "eval_f1_macro": 0.64551501310817,
1092
+ "eval_f1_micro": 0.7936467053015668,
1093
+ "eval_loss": 0.13569533824920654,
1094
+ "eval_roc_auc": 0.857159821715051,
1095
+ "eval_runtime": 424.6551,
1096
+ "eval_samples_per_second": 6.796,
1097
+ "eval_steps_per_second": 0.214,
1098
+ "learning_rate": 0.0001,
1099
+ "step": 17745
1100
+ },
1101
+ {
1102
+ "epoch": 65.93406593406593,
1103
+ "grad_norm": 0.19799016416072845,
1104
+ "learning_rate": 0.0001,
1105
+ "loss": 0.1529,
1106
+ "step": 18000
1107
+ },
1108
+ {
1109
+ "epoch": 66.0,
1110
+ "eval_accuracy": 0.2713097713097713,
1111
+ "eval_f1_macro": 0.6477176853690674,
1112
+ "eval_f1_micro": 0.794643237940888,
1113
+ "eval_loss": 0.13565082848072052,
1114
+ "eval_roc_auc": 0.8594942449609874,
1115
+ "eval_runtime": 425.0795,
1116
+ "eval_samples_per_second": 6.789,
1117
+ "eval_steps_per_second": 0.214,
1118
+ "learning_rate": 0.0001,
1119
+ "step": 18018
1120
+ },
1121
+ {
1122
+ "epoch": 67.0,
1123
+ "eval_accuracy": 0.27546777546777546,
1124
+ "eval_f1_macro": 0.6544361257862924,
1125
+ "eval_f1_micro": 0.7965922095536813,
1126
+ "eval_loss": 0.13533934950828552,
1127
+ "eval_roc_auc": 0.8622831129363361,
1128
+ "eval_runtime": 424.6762,
1129
+ "eval_samples_per_second": 6.796,
1130
+ "eval_steps_per_second": 0.214,
1131
+ "learning_rate": 0.0001,
1132
+ "step": 18291
1133
+ },
1134
+ {
1135
+ "epoch": 67.76556776556777,
1136
+ "grad_norm": 0.2619948983192444,
1137
+ "learning_rate": 0.0001,
1138
+ "loss": 0.1528,
1139
+ "step": 18500
1140
+ },
1141
+ {
1142
+ "epoch": 68.0,
1143
+ "eval_accuracy": 0.2733887733887734,
1144
+ "eval_f1_macro": 0.6519486064773884,
1145
+ "eval_f1_micro": 0.7955772910907932,
1146
+ "eval_loss": 0.1353396475315094,
1147
+ "eval_roc_auc": 0.8608058154545816,
1148
+ "eval_runtime": 421.8067,
1149
+ "eval_samples_per_second": 6.842,
1150
+ "eval_steps_per_second": 0.216,
1151
+ "learning_rate": 0.0001,
1152
+ "step": 18564
1153
+ },
1154
+ {
1155
+ "epoch": 69.0,
1156
+ "eval_accuracy": 0.26992376992376993,
1157
+ "eval_f1_macro": 0.6515714856354324,
1158
+ "eval_f1_micro": 0.7966188524590164,
1159
+ "eval_loss": 0.13474246859550476,
1160
+ "eval_roc_auc": 0.8602900698481241,
1161
+ "eval_runtime": 423.2901,
1162
+ "eval_samples_per_second": 6.818,
1163
+ "eval_steps_per_second": 0.215,
1164
+ "learning_rate": 0.0001,
1165
+ "step": 18837
1166
+ },
1167
+ {
1168
+ "epoch": 69.59706959706959,
1169
+ "grad_norm": 0.18048201501369476,
1170
+ "learning_rate": 0.0001,
1171
+ "loss": 0.1528,
1172
+ "step": 19000
1173
+ },
1174
+ {
1175
+ "epoch": 70.0,
1176
+ "eval_accuracy": 0.272002772002772,
1177
+ "eval_f1_macro": 0.6441608871918139,
1178
+ "eval_f1_micro": 0.7944687795241776,
1179
+ "eval_loss": 0.13504748046398163,
1180
+ "eval_roc_auc": 0.8574953132327267,
1181
+ "eval_runtime": 423.3844,
1182
+ "eval_samples_per_second": 6.817,
1183
+ "eval_steps_per_second": 0.215,
1184
+ "learning_rate": 0.0001,
1185
+ "step": 19110
1186
+ },
1187
+ {
1188
+ "epoch": 71.0,
1189
+ "eval_accuracy": 0.27234927234927236,
1190
+ "eval_f1_macro": 0.6441889860402124,
1191
+ "eval_f1_micro": 0.7933057280883367,
1192
+ "eval_loss": 0.13502468168735504,
1193
+ "eval_roc_auc": 0.8556664277229126,
1194
+ "eval_runtime": 422.6912,
1195
+ "eval_samples_per_second": 6.828,
1196
+ "eval_steps_per_second": 0.215,
1197
+ "learning_rate": 0.0001,
1198
+ "step": 19383
1199
+ },
1200
+ {
1201
+ "epoch": 71.42857142857143,
1202
+ "grad_norm": 0.24162879586219788,
1203
+ "learning_rate": 0.0001,
1204
+ "loss": 0.1522,
1205
+ "step": 19500
1206
+ },
1207
+ {
1208
+ "epoch": 72.0,
1209
+ "eval_accuracy": 0.2758142758142758,
1210
+ "eval_f1_macro": 0.6484748365424647,
1211
+ "eval_f1_micro": 0.7969950486597234,
1212
+ "eval_loss": 0.1344645917415619,
1213
+ "eval_roc_auc": 0.8605409876174911,
1214
+ "eval_runtime": 426.5755,
1215
+ "eval_samples_per_second": 6.766,
1216
+ "eval_steps_per_second": 0.213,
1217
+ "learning_rate": 0.0001,
1218
+ "step": 19656
1219
+ },
1220
+ {
1221
+ "epoch": 73.0,
1222
+ "eval_accuracy": 0.27616077616077617,
1223
+ "eval_f1_macro": 0.6518769914193778,
1224
+ "eval_f1_micro": 0.7977006599957419,
1225
+ "eval_loss": 0.1341526359319687,
1226
+ "eval_roc_auc": 0.8616010233088203,
1227
+ "eval_runtime": 420.7226,
1228
+ "eval_samples_per_second": 6.86,
1229
+ "eval_steps_per_second": 0.216,
1230
+ "learning_rate": 0.0001,
1231
+ "step": 19929
1232
+ },
1233
+ {
1234
+ "epoch": 73.26007326007326,
1235
+ "grad_norm": 0.22451983392238617,
1236
+ "learning_rate": 0.0001,
1237
+ "loss": 0.1523,
1238
+ "step": 20000
1239
+ },
1240
+ {
1241
+ "epoch": 74.0,
1242
+ "eval_accuracy": 0.2751212751212751,
1243
+ "eval_f1_macro": 0.641334935505441,
1244
+ "eval_f1_micro": 0.7914797229603171,
1245
+ "eval_loss": 0.13499116897583008,
1246
+ "eval_roc_auc": 0.8520198169504839,
1247
+ "eval_runtime": 428.7922,
1248
+ "eval_samples_per_second": 6.731,
1249
+ "eval_steps_per_second": 0.212,
1250
+ "learning_rate": 0.0001,
1251
+ "step": 20202
1252
+ },
1253
+ {
1254
+ "epoch": 75.0,
1255
+ "eval_accuracy": 0.2751212751212751,
1256
+ "eval_f1_macro": 0.6485229770180625,
1257
+ "eval_f1_micro": 0.7946678133734681,
1258
+ "eval_loss": 0.13461369276046753,
1259
+ "eval_roc_auc": 0.8572354216588205,
1260
+ "eval_runtime": 427.8784,
1261
+ "eval_samples_per_second": 6.745,
1262
+ "eval_steps_per_second": 0.213,
1263
+ "learning_rate": 0.0001,
1264
+ "step": 20475
1265
+ },
1266
+ {
1267
+ "epoch": 75.0915750915751,
1268
+ "grad_norm": 0.22029711306095123,
1269
+ "learning_rate": 0.0001,
1270
+ "loss": 0.1521,
1271
+ "step": 20500
1272
+ },
1273
+ {
1274
+ "epoch": 76.0,
1275
+ "eval_accuracy": 0.2758142758142758,
1276
+ "eval_f1_macro": 0.6478195810395848,
1277
+ "eval_f1_micro": 0.7964594201659113,
1278
+ "eval_loss": 0.13438266515731812,
1279
+ "eval_roc_auc": 0.8597526207801657,
1280
+ "eval_runtime": 424.3142,
1281
+ "eval_samples_per_second": 6.802,
1282
+ "eval_steps_per_second": 0.214,
1283
+ "learning_rate": 0.0001,
1284
+ "step": 20748
1285
+ },
1286
+ {
1287
+ "epoch": 76.92307692307692,
1288
+ "grad_norm": 0.2415299415588379,
1289
+ "learning_rate": 0.0001,
1290
+ "loss": 0.1515,
1291
+ "step": 21000
1292
+ },
1293
+ {
1294
+ "epoch": 77.0,
1295
+ "eval_accuracy": 0.27754677754677753,
1296
+ "eval_f1_macro": 0.6536737916153181,
1297
+ "eval_f1_micro": 0.7977742853502102,
1298
+ "eval_loss": 0.13460540771484375,
1299
+ "eval_roc_auc": 0.8623314561225224,
1300
+ "eval_runtime": 422.8083,
1301
+ "eval_samples_per_second": 6.826,
1302
+ "eval_steps_per_second": 0.215,
1303
+ "learning_rate": 0.0001,
1304
+ "step": 21021
1305
+ },
1306
+ {
1307
+ "epoch": 78.0,
1308
+ "eval_accuracy": 0.27754677754677753,
1309
+ "eval_f1_macro": 0.6543115985953537,
1310
+ "eval_f1_micro": 0.7978169818504888,
1311
+ "eval_loss": 0.13411369919776917,
1312
+ "eval_roc_auc": 0.8634738791194995,
1313
+ "eval_runtime": 428.5067,
1314
+ "eval_samples_per_second": 6.735,
1315
+ "eval_steps_per_second": 0.212,
1316
+ "learning_rate": 0.0001,
1317
+ "step": 21294
1318
+ },
1319
+ {
1320
+ "epoch": 78.75457875457876,
1321
+ "grad_norm": 0.2636328637599945,
1322
+ "learning_rate": 0.0001,
1323
+ "loss": 0.1514,
1324
+ "step": 21500
1325
+ },
1326
+ {
1327
+ "epoch": 79.0,
1328
+ "eval_accuracy": 0.2740817740817741,
1329
+ "eval_f1_macro": 0.6523004018612216,
1330
+ "eval_f1_micro": 0.7953020134228188,
1331
+ "eval_loss": 0.13399606943130493,
1332
+ "eval_roc_auc": 0.8574454542918126,
1333
+ "eval_runtime": 436.7976,
1334
+ "eval_samples_per_second": 6.607,
1335
+ "eval_steps_per_second": 0.208,
1336
+ "learning_rate": 0.0001,
1337
+ "step": 21567
1338
+ },
1339
+ {
1340
+ "epoch": 80.0,
1341
+ "eval_accuracy": 0.27823977823977825,
1342
+ "eval_f1_macro": 0.6545582038870168,
1343
+ "eval_f1_micro": 0.7993085420355848,
1344
+ "eval_loss": 0.1344238668680191,
1345
+ "eval_roc_auc": 0.8652547567870936,
1346
+ "eval_runtime": 431.9941,
1347
+ "eval_samples_per_second": 6.681,
1348
+ "eval_steps_per_second": 0.211,
1349
+ "learning_rate": 0.0001,
1350
+ "step": 21840
1351
+ },
1352
+ {
1353
+ "epoch": 80.58608058608058,
1354
+ "grad_norm": 0.23601791262626648,
1355
+ "learning_rate": 0.0001,
1356
+ "loss": 0.1516,
1357
+ "step": 22000
1358
+ },
1359
+ {
1360
+ "epoch": 81.0,
1361
+ "eval_accuracy": 0.2758142758142758,
1362
+ "eval_f1_macro": 0.6559691700651434,
1363
+ "eval_f1_micro": 0.7966715529878418,
1364
+ "eval_loss": 0.13405664265155792,
1365
+ "eval_roc_auc": 0.8575861109650502,
1366
+ "eval_runtime": 436.6356,
1367
+ "eval_samples_per_second": 6.61,
1368
+ "eval_steps_per_second": 0.208,
1369
+ "learning_rate": 0.0001,
1370
+ "step": 22113
1371
+ },
1372
+ {
1373
+ "epoch": 82.0,
1374
+ "eval_accuracy": 0.2765072765072765,
1375
+ "eval_f1_macro": 0.6453669674995801,
1376
+ "eval_f1_micro": 0.7947541551246537,
1377
+ "eval_loss": 0.13407430052757263,
1378
+ "eval_roc_auc": 0.8554945304057716,
1379
+ "eval_runtime": 436.5794,
1380
+ "eval_samples_per_second": 6.61,
1381
+ "eval_steps_per_second": 0.208,
1382
+ "learning_rate": 0.0001,
1383
+ "step": 22386
1384
+ },
1385
+ {
1386
+ "epoch": 82.41758241758242,
1387
+ "grad_norm": 0.19588124752044678,
1388
+ "learning_rate": 0.0001,
1389
+ "loss": 0.149,
1390
+ "step": 22500
1391
+ },
1392
+ {
1393
+ "epoch": 83.0,
1394
+ "eval_accuracy": 0.2702702702702703,
1395
+ "eval_f1_macro": 0.645966570658811,
1396
+ "eval_f1_micro": 0.7924365020985678,
1397
+ "eval_loss": 0.1350804716348648,
1398
+ "eval_roc_auc": 0.8543412288505268,
1399
+ "eval_runtime": 433.6987,
1400
+ "eval_samples_per_second": 6.654,
1401
+ "eval_steps_per_second": 0.21,
1402
+ "learning_rate": 0.0001,
1403
+ "step": 22659
1404
+ },
1405
+ {
1406
+ "epoch": 84.0,
1407
+ "eval_accuracy": 0.27546777546777546,
1408
+ "eval_f1_macro": 0.6512285101875886,
1409
+ "eval_f1_micro": 0.7957293542577825,
1410
+ "eval_loss": 0.13387472927570343,
1411
+ "eval_roc_auc": 0.8585996545688873,
1412
+ "eval_runtime": 432.4386,
1413
+ "eval_samples_per_second": 6.674,
1414
+ "eval_steps_per_second": 0.21,
1415
+ "learning_rate": 0.0001,
1416
+ "step": 22932
1417
+ },
1418
+ {
1419
+ "epoch": 84.24908424908425,
1420
+ "grad_norm": 0.2560372054576874,
1421
+ "learning_rate": 0.0001,
1422
+ "loss": 0.1515,
1423
+ "step": 23000
1424
+ },
1425
+ {
1426
+ "epoch": 85.0,
1427
+ "eval_accuracy": 0.27927927927927926,
1428
+ "eval_f1_macro": 0.6531817491521362,
1429
+ "eval_f1_micro": 0.7990622335890879,
1430
+ "eval_loss": 0.13341927528381348,
1431
+ "eval_roc_auc": 0.8620406055936447,
1432
+ "eval_runtime": 432.3488,
1433
+ "eval_samples_per_second": 6.675,
1434
+ "eval_steps_per_second": 0.21,
1435
+ "learning_rate": 0.0001,
1436
+ "step": 23205
1437
+ },
1438
+ {
1439
+ "epoch": 86.0,
1440
+ "eval_accuracy": 0.2747747747747748,
1441
+ "eval_f1_macro": 0.6595866427349153,
1442
+ "eval_f1_micro": 0.7988261313371896,
1443
+ "eval_loss": 0.13337253034114838,
1444
+ "eval_roc_auc": 0.8625331319838734,
1445
+ "eval_runtime": 435.2436,
1446
+ "eval_samples_per_second": 6.631,
1447
+ "eval_steps_per_second": 0.209,
1448
+ "learning_rate": 0.0001,
1449
+ "step": 23478
1450
+ },
1451
+ {
1452
+ "epoch": 86.08058608058609,
1453
+ "grad_norm": 0.28640052676200867,
1454
+ "learning_rate": 0.0001,
1455
+ "loss": 0.1495,
1456
+ "step": 23500
1457
+ },
1458
+ {
1459
+ "epoch": 87.0,
1460
+ "eval_accuracy": 0.27442827442827444,
1461
+ "eval_f1_macro": 0.6467323251879672,
1462
+ "eval_f1_micro": 0.7956179390619651,
1463
+ "eval_loss": 0.1339845359325409,
1464
+ "eval_roc_auc": 0.8590850582532711,
1465
+ "eval_runtime": 438.7375,
1466
+ "eval_samples_per_second": 6.578,
1467
+ "eval_steps_per_second": 0.207,
1468
+ "learning_rate": 0.0001,
1469
+ "step": 23751
1470
+ },
1471
+ {
1472
+ "epoch": 87.91208791208791,
1473
+ "grad_norm": 0.23546907305717468,
1474
+ "learning_rate": 0.0001,
1475
+ "loss": 0.1496,
1476
+ "step": 24000
1477
+ },
1478
+ {
1479
+ "epoch": 88.0,
1480
+ "eval_accuracy": 0.2747747747747748,
1481
+ "eval_f1_macro": 0.648318545746826,
1482
+ "eval_f1_micro": 0.7981612326551459,
1483
+ "eval_loss": 0.13357459008693695,
1484
+ "eval_roc_auc": 0.8619578829440303,
1485
+ "eval_runtime": 432.3449,
1486
+ "eval_samples_per_second": 6.675,
1487
+ "eval_steps_per_second": 0.21,
1488
+ "learning_rate": 0.0001,
1489
+ "step": 24024
1490
+ },
1491
+ {
1492
+ "epoch": 89.0,
1493
+ "eval_accuracy": 0.2806652806652807,
1494
+ "eval_f1_macro": 0.6585340844298272,
1495
+ "eval_f1_micro": 0.8014968675104065,
1496
+ "eval_loss": 0.13366733491420746,
1497
+ "eval_roc_auc": 0.8672320387088881,
1498
+ "eval_runtime": 431.6296,
1499
+ "eval_samples_per_second": 6.686,
1500
+ "eval_steps_per_second": 0.211,
1501
+ "learning_rate": 0.0001,
1502
+ "step": 24297
1503
+ },
1504
+ {
1505
+ "epoch": 89.74358974358974,
1506
+ "grad_norm": 0.24246211349964142,
1507
+ "learning_rate": 0.0001,
1508
+ "loss": 0.1493,
1509
+ "step": 24500
1510
+ },
1511
+ {
1512
+ "epoch": 90.0,
1513
+ "eval_accuracy": 0.2772002772002772,
1514
+ "eval_f1_macro": 0.66211749340029,
1515
+ "eval_f1_micro": 0.8010798042854732,
1516
+ "eval_loss": 0.1332736760377884,
1517
+ "eval_roc_auc": 0.8661044781564988,
1518
+ "eval_runtime": 425.5723,
1519
+ "eval_samples_per_second": 6.781,
1520
+ "eval_steps_per_second": 0.214,
1521
+ "learning_rate": 0.0001,
1522
+ "step": 24570
1523
+ },
1524
+ {
1525
+ "epoch": 91.0,
1526
+ "eval_accuracy": 0.27823977823977825,
1527
+ "eval_f1_macro": 0.6528573832362276,
1528
+ "eval_f1_micro": 0.7956933454403943,
1529
+ "eval_loss": 0.13367226719856262,
1530
+ "eval_roc_auc": 0.8562680347985093,
1531
+ "eval_runtime": 443.8961,
1532
+ "eval_samples_per_second": 6.502,
1533
+ "eval_steps_per_second": 0.205,
1534
+ "learning_rate": 0.0001,
1535
+ "step": 24843
1536
+ },
1537
+ {
1538
+ "epoch": 91.57509157509158,
1539
+ "grad_norm": 0.22026851773262024,
1540
+ "learning_rate": 0.0001,
1541
+ "loss": 0.1496,
1542
+ "step": 25000
1543
+ },
1544
+ {
1545
+ "epoch": 92.0,
1546
+ "eval_accuracy": 0.27546777546777546,
1547
+ "eval_f1_macro": 0.6513649424471982,
1548
+ "eval_f1_micro": 0.796086375587259,
1549
+ "eval_loss": 0.13348612189292908,
1550
+ "eval_roc_auc": 0.8573559442803198,
1551
+ "eval_runtime": 443.9031,
1552
+ "eval_samples_per_second": 6.501,
1553
+ "eval_steps_per_second": 0.205,
1554
+ "learning_rate": 0.0001,
1555
+ "step": 25116
1556
+ },
1557
+ {
1558
+ "epoch": 93.0,
1559
+ "eval_accuracy": 0.2758142758142758,
1560
+ "eval_f1_macro": 0.6559763883082907,
1561
+ "eval_f1_micro": 0.8001861094662043,
1562
+ "eval_loss": 0.1330718696117401,
1563
+ "eval_roc_auc": 0.8648260530605368,
1564
+ "eval_runtime": 436.5725,
1565
+ "eval_samples_per_second": 6.611,
1566
+ "eval_steps_per_second": 0.208,
1567
+ "learning_rate": 0.0001,
1568
+ "step": 25389
1569
+ },
1570
+ {
1571
+ "epoch": 93.4065934065934,
1572
+ "grad_norm": 0.28630152344703674,
1573
+ "learning_rate": 0.0001,
1574
+ "loss": 0.1493,
1575
+ "step": 25500
1576
+ },
1577
+ {
1578
+ "epoch": 94.0,
1579
+ "eval_accuracy": 0.2758142758142758,
1580
+ "eval_f1_macro": 0.6553585917255438,
1581
+ "eval_f1_micro": 0.7995090362720617,
1582
+ "eval_loss": 0.13329002261161804,
1583
+ "eval_roc_auc": 0.864277443745379,
1584
+ "eval_runtime": 442.8808,
1585
+ "eval_samples_per_second": 6.516,
1586
+ "eval_steps_per_second": 0.205,
1587
+ "learning_rate": 0.0001,
1588
+ "step": 25662
1589
+ },
1590
+ {
1591
+ "epoch": 95.0,
1592
+ "eval_accuracy": 0.2758142758142758,
1593
+ "eval_f1_macro": 0.6579543710907207,
1594
+ "eval_f1_micro": 0.7979651162790697,
1595
+ "eval_loss": 0.13314621150493622,
1596
+ "eval_roc_auc": 0.8606367216129991,
1597
+ "eval_runtime": 436.3942,
1598
+ "eval_samples_per_second": 6.613,
1599
+ "eval_steps_per_second": 0.209,
1600
+ "learning_rate": 0.0001,
1601
+ "step": 25935
1602
+ },
1603
+ {
1604
+ "epoch": 95.23809523809524,
1605
+ "grad_norm": 0.25194719433784485,
1606
+ "learning_rate": 0.0001,
1607
+ "loss": 0.1482,
1608
+ "step": 26000
1609
+ },
1610
+ {
1611
+ "epoch": 96.0,
1612
+ "eval_accuracy": 0.2751212751212751,
1613
+ "eval_f1_macro": 0.6556445954379041,
1614
+ "eval_f1_micro": 0.7992523999660183,
1615
+ "eval_loss": 0.13279949128627777,
1616
+ "eval_roc_auc": 0.8631226264354063,
1617
+ "eval_runtime": 426.8086,
1618
+ "eval_samples_per_second": 6.762,
1619
+ "eval_steps_per_second": 0.213,
1620
+ "learning_rate": 0.0001,
1621
+ "step": 26208
1622
+ },
1623
+ {
1624
+ "epoch": 97.0,
1625
+ "eval_accuracy": 0.27823977823977825,
1626
+ "eval_f1_macro": 0.6492741904723621,
1627
+ "eval_f1_micro": 0.7977296181630549,
1628
+ "eval_loss": 0.1332886964082718,
1629
+ "eval_roc_auc": 0.8588905587527994,
1630
+ "eval_runtime": 441.9848,
1631
+ "eval_samples_per_second": 6.53,
1632
+ "eval_steps_per_second": 0.206,
1633
+ "learning_rate": 0.0001,
1634
+ "step": 26481
1635
+ },
1636
+ {
1637
+ "epoch": 97.06959706959707,
1638
+ "grad_norm": 0.27280953526496887,
1639
+ "learning_rate": 0.0001,
1640
+ "loss": 0.1497,
1641
+ "step": 26500
1642
+ },
1643
+ {
1644
+ "epoch": 98.0,
1645
+ "eval_accuracy": 0.27546777546777546,
1646
+ "eval_f1_macro": 0.6600105762308898,
1647
+ "eval_f1_micro": 0.799611141637432,
1648
+ "eval_loss": 0.13266970217227936,
1649
+ "eval_roc_auc": 0.864715456620441,
1650
+ "eval_runtime": 439.781,
1651
+ "eval_samples_per_second": 6.562,
1652
+ "eval_steps_per_second": 0.207,
1653
+ "learning_rate": 0.0001,
1654
+ "step": 26754
1655
+ },
1656
+ {
1657
+ "epoch": 98.9010989010989,
1658
+ "grad_norm": 0.30599892139434814,
1659
+ "learning_rate": 0.0001,
1660
+ "loss": 0.1489,
1661
+ "step": 27000
1662
+ },
1663
+ {
1664
+ "epoch": 99.0,
1665
+ "eval_accuracy": 0.27165627165627165,
1666
+ "eval_f1_macro": 0.6589970862385839,
1667
+ "eval_f1_micro": 0.7978809757764771,
1668
+ "eval_loss": 0.13253149390220642,
1669
+ "eval_roc_auc": 0.8607699202364255,
1670
+ "eval_runtime": 438.5456,
1671
+ "eval_samples_per_second": 6.581,
1672
+ "eval_steps_per_second": 0.208,
1673
+ "learning_rate": 0.0001,
1674
+ "step": 27027
1675
+ },
1676
+ {
1677
+ "epoch": 100.0,
1678
+ "eval_accuracy": 0.27616077616077617,
1679
+ "eval_f1_macro": 0.6570195655430786,
1680
+ "eval_f1_micro": 0.797143840330351,
1681
+ "eval_loss": 0.1329408884048462,
1682
+ "eval_roc_auc": 0.8584810367011169,
1683
+ "eval_runtime": 434.9771,
1684
+ "eval_samples_per_second": 6.635,
1685
+ "eval_steps_per_second": 0.209,
1686
+ "learning_rate": 0.0001,
1687
+ "step": 27300
1688
+ },
1689
+ {
1690
+ "epoch": 100.73260073260073,
1691
+ "grad_norm": 0.2732805013656616,
1692
+ "learning_rate": 0.0001,
1693
+ "loss": 0.1482,
1694
+ "step": 27500
1695
+ },
1696
+ {
1697
+ "epoch": 101.0,
1698
+ "eval_accuracy": 0.28205128205128205,
1699
+ "eval_f1_macro": 0.657951499975745,
1700
+ "eval_f1_micro": 0.7991615690636095,
1701
+ "eval_loss": 0.13274870812892914,
1702
+ "eval_roc_auc": 0.861103560655407,
1703
+ "eval_runtime": 435.4493,
1704
+ "eval_samples_per_second": 6.628,
1705
+ "eval_steps_per_second": 0.209,
1706
+ "learning_rate": 0.0001,
1707
+ "step": 27573
1708
+ },
1709
+ {
1710
+ "epoch": 102.0,
1711
+ "eval_accuracy": 0.2817047817047817,
1712
+ "eval_f1_macro": 0.654306822863844,
1713
+ "eval_f1_micro": 0.7986821274228745,
1714
+ "eval_loss": 0.1326293796300888,
1715
+ "eval_roc_auc": 0.8607733407448822,
1716
+ "eval_runtime": 437.9645,
1717
+ "eval_samples_per_second": 6.59,
1718
+ "eval_steps_per_second": 0.208,
1719
+ "learning_rate": 0.0001,
1720
+ "step": 27846
1721
+ },
1722
+ {
1723
+ "epoch": 102.56410256410257,
1724
+ "grad_norm": 0.23533137142658234,
1725
+ "learning_rate": 0.0001,
1726
+ "loss": 0.1474,
1727
+ "step": 28000
1728
+ },
1729
+ {
1730
+ "epoch": 103.0,
1731
+ "eval_accuracy": 0.2803187803187803,
1732
+ "eval_f1_macro": 0.6518495856500403,
1733
+ "eval_f1_micro": 0.7993688968487486,
1734
+ "eval_loss": 0.13247379660606384,
1735
+ "eval_roc_auc": 0.8620991566501659,
1736
+ "eval_runtime": 426.0566,
1737
+ "eval_samples_per_second": 6.774,
1738
+ "eval_steps_per_second": 0.214,
1739
+ "learning_rate": 0.0001,
1740
+ "step": 28119
1741
+ },
1742
+ {
1743
+ "epoch": 104.0,
1744
+ "eval_accuracy": 0.27754677754677753,
1745
+ "eval_f1_macro": 0.6612536009112525,
1746
+ "eval_f1_micro": 0.8010850676047981,
1747
+ "eval_loss": 0.13315415382385254,
1748
+ "eval_roc_auc": 0.864729420343199,
1749
+ "eval_runtime": 425.2679,
1750
+ "eval_samples_per_second": 6.786,
1751
+ "eval_steps_per_second": 0.214,
1752
+ "learning_rate": 0.0001,
1753
+ "step": 28392
1754
+ },
1755
+ {
1756
+ "epoch": 104.3956043956044,
1757
+ "grad_norm": 0.2809629738330841,
1758
+ "learning_rate": 0.0001,
1759
+ "loss": 0.1472,
1760
+ "step": 28500
1761
+ },
1762
+ {
1763
+ "epoch": 105.0,
1764
+ "eval_accuracy": 0.2830907830907831,
1765
+ "eval_f1_macro": 0.6635718544409769,
1766
+ "eval_f1_micro": 0.8012698412698412,
1767
+ "eval_loss": 0.13218620419502258,
1768
+ "eval_roc_auc": 0.8652135899617869,
1769
+ "eval_runtime": 425.1586,
1770
+ "eval_samples_per_second": 6.788,
1771
+ "eval_steps_per_second": 0.214,
1772
+ "learning_rate": 0.0001,
1773
+ "step": 28665
1774
+ },
1775
+ {
1776
+ "epoch": 106.0,
1777
+ "eval_accuracy": 0.2830907830907831,
1778
+ "eval_f1_macro": 0.6588128942023547,
1779
+ "eval_f1_micro": 0.800988243312319,
1780
+ "eval_loss": 0.13239973783493042,
1781
+ "eval_roc_auc": 0.8632750603887415,
1782
+ "eval_runtime": 427.5404,
1783
+ "eval_samples_per_second": 6.75,
1784
+ "eval_steps_per_second": 0.213,
1785
+ "learning_rate": 0.0001,
1786
+ "step": 28938
1787
+ },
1788
+ {
1789
+ "epoch": 106.22710622710623,
1790
+ "grad_norm": 0.2568123936653137,
1791
+ "learning_rate": 0.0001,
1792
+ "loss": 0.148,
1793
+ "step": 29000
1794
+ },
1795
+ {
1796
+ "epoch": 107.0,
1797
+ "eval_accuracy": 0.2785862785862786,
1798
+ "eval_f1_macro": 0.650564106362156,
1799
+ "eval_f1_micro": 0.7985513421389007,
1800
+ "eval_loss": 0.13358280062675476,
1801
+ "eval_roc_auc": 0.8618832353771251,
1802
+ "eval_runtime": 425.2874,
1803
+ "eval_samples_per_second": 6.786,
1804
+ "eval_steps_per_second": 0.214,
1805
+ "learning_rate": 0.0001,
1806
+ "step": 29211
1807
+ },
1808
+ {
1809
+ "epoch": 108.0,
1810
+ "eval_accuracy": 0.2796257796257796,
1811
+ "eval_f1_macro": 0.6501303094783896,
1812
+ "eval_f1_micro": 0.7995554225623049,
1813
+ "eval_loss": 0.13270235061645508,
1814
+ "eval_roc_auc": 0.8615071940670409,
1815
+ "eval_runtime": 432.9179,
1816
+ "eval_samples_per_second": 6.666,
1817
+ "eval_steps_per_second": 0.21,
1818
+ "learning_rate": 0.0001,
1819
+ "step": 29484
1820
+ },
1821
+ {
1822
+ "epoch": 108.05860805860806,
1823
+ "grad_norm": 0.29480934143066406,
1824
+ "learning_rate": 0.0001,
1825
+ "loss": 0.1477,
1826
+ "step": 29500
1827
+ },
1828
+ {
1829
+ "epoch": 109.0,
1830
+ "eval_accuracy": 0.2806652806652807,
1831
+ "eval_f1_macro": 0.6579556871315007,
1832
+ "eval_f1_micro": 0.8000342553738118,
1833
+ "eval_loss": 0.1318453699350357,
1834
+ "eval_roc_auc": 0.8612993478767093,
1835
+ "eval_runtime": 434.6895,
1836
+ "eval_samples_per_second": 6.639,
1837
+ "eval_steps_per_second": 0.209,
1838
+ "learning_rate": 0.0001,
1839
+ "step": 29757
1840
+ },
1841
+ {
1842
+ "epoch": 109.89010989010988,
1843
+ "grad_norm": 0.3718918561935425,
1844
+ "learning_rate": 0.0001,
1845
+ "loss": 0.1479,
1846
+ "step": 30000
1847
+ },
1848
+ {
1849
+ "epoch": 110.0,
1850
+ "eval_accuracy": 0.2803187803187803,
1851
+ "eval_f1_macro": 0.6582487839550253,
1852
+ "eval_f1_micro": 0.7997274043785672,
1853
+ "eval_loss": 0.13255637884140015,
1854
+ "eval_roc_auc": 0.8626158546334878,
1855
+ "eval_runtime": 427.7015,
1856
+ "eval_samples_per_second": 6.748,
1857
+ "eval_steps_per_second": 0.213,
1858
+ "learning_rate": 0.0001,
1859
+ "step": 30030
1860
+ },
1861
+ {
1862
+ "epoch": 111.0,
1863
+ "eval_accuracy": 0.2785862785862786,
1864
+ "eval_f1_macro": 0.6608614747058748,
1865
+ "eval_f1_micro": 0.8012935069355799,
1866
+ "eval_loss": 0.1319260448217392,
1867
+ "eval_roc_auc": 0.8637521073014844,
1868
+ "eval_runtime": 422.4227,
1869
+ "eval_samples_per_second": 6.832,
1870
+ "eval_steps_per_second": 0.215,
1871
+ "learning_rate": 0.0001,
1872
+ "step": 30303
1873
+ },
1874
+ {
1875
+ "epoch": 111.72161172161172,
1876
+ "grad_norm": 0.3544025719165802,
1877
+ "learning_rate": 0.0001,
1878
+ "loss": 0.1466,
1879
+ "step": 30500
1880
+ },
1881
+ {
1882
+ "epoch": 112.0,
1883
+ "eval_accuracy": 0.28101178101178104,
1884
+ "eval_f1_macro": 0.6595016342799644,
1885
+ "eval_f1_micro": 0.8019278738426415,
1886
+ "eval_loss": 0.13223350048065186,
1887
+ "eval_roc_auc": 0.8659084092462648,
1888
+ "eval_runtime": 420.8235,
1889
+ "eval_samples_per_second": 6.858,
1890
+ "eval_steps_per_second": 0.216,
1891
+ "learning_rate": 0.0001,
1892
+ "step": 30576
1893
+ },
1894
+ {
1895
+ "epoch": 113.0,
1896
+ "eval_accuracy": 0.27997227997227997,
1897
+ "eval_f1_macro": 0.6592029124671744,
1898
+ "eval_f1_micro": 0.8024988392216453,
1899
+ "eval_loss": 0.13213913142681122,
1900
+ "eval_roc_auc": 0.8666766420318518,
1901
+ "eval_runtime": 423.8949,
1902
+ "eval_samples_per_second": 6.808,
1903
+ "eval_steps_per_second": 0.215,
1904
+ "learning_rate": 0.0001,
1905
+ "step": 30849
1906
+ },
1907
+ {
1908
+ "epoch": 113.55311355311355,
1909
+ "grad_norm": 0.35069116950035095,
1910
+ "learning_rate": 0.0001,
1911
+ "loss": 0.1474,
1912
+ "step": 31000
1913
+ },
1914
+ {
1915
+ "epoch": 114.0,
1916
+ "eval_accuracy": 0.2823977823977824,
1917
+ "eval_f1_macro": 0.663088095209859,
1918
+ "eval_f1_micro": 0.8025030654094965,
1919
+ "eval_loss": 0.13204564154148102,
1920
+ "eval_roc_auc": 0.8661983610533127,
1921
+ "eval_runtime": 421.2287,
1922
+ "eval_samples_per_second": 6.851,
1923
+ "eval_steps_per_second": 0.216,
1924
+ "learning_rate": 0.0001,
1925
+ "step": 31122
1926
+ },
1927
+ {
1928
+ "epoch": 115.0,
1929
+ "eval_accuracy": 0.28378378378378377,
1930
+ "eval_f1_macro": 0.659797224924612,
1931
+ "eval_f1_micro": 0.8004266211604096,
1932
+ "eval_loss": 0.1319342404603958,
1933
+ "eval_roc_auc": 0.8625399730007867,
1934
+ "eval_runtime": 424.6871,
1935
+ "eval_samples_per_second": 6.796,
1936
+ "eval_steps_per_second": 0.214,
1937
+ "learning_rate": 0.0001,
1938
+ "step": 31395
1939
+ },
1940
+ {
1941
+ "epoch": 115.38461538461539,
1942
+ "grad_norm": 0.29624369740486145,
1943
+ "learning_rate": 1e-05,
1944
+ "loss": 0.1468,
1945
+ "step": 31500
1946
+ },
1947
+ {
1948
+ "epoch": 116.0,
1949
+ "eval_accuracy": 0.2844767844767845,
1950
+ "eval_f1_macro": 0.6627361818946377,
1951
+ "eval_f1_micro": 0.8022295974810655,
1952
+ "eval_loss": 0.13186337053775787,
1953
+ "eval_roc_auc": 0.8642598314802673,
1954
+ "eval_runtime": 423.8673,
1955
+ "eval_samples_per_second": 6.809,
1956
+ "eval_steps_per_second": 0.215,
1957
+ "learning_rate": 1e-05,
1958
+ "step": 31668
1959
+ },
1960
+ {
1961
+ "epoch": 117.0,
1962
+ "eval_accuracy": 0.28205128205128205,
1963
+ "eval_f1_macro": 0.6604165936303265,
1964
+ "eval_f1_micro": 0.8012607547491268,
1965
+ "eval_loss": 0.1317850947380066,
1966
+ "eval_roc_auc": 0.8634466760169507,
1967
+ "eval_runtime": 419.012,
1968
+ "eval_samples_per_second": 6.888,
1969
+ "eval_steps_per_second": 0.217,
1970
+ "learning_rate": 1e-05,
1971
+ "step": 31941
1972
+ },
1973
+ {
1974
+ "epoch": 117.21611721611721,
1975
+ "grad_norm": 0.28633400797843933,
1976
+ "learning_rate": 1e-05,
1977
+ "loss": 0.1455,
1978
+ "step": 32000
1979
+ },
1980
+ {
1981
+ "epoch": 118.0,
1982
+ "eval_accuracy": 0.2796257796257796,
1983
+ "eval_f1_macro": 0.6590147410119703,
1984
+ "eval_f1_micro": 0.8002395926924228,
1985
+ "eval_loss": 0.13159342110157013,
1986
+ "eval_roc_auc": 0.8616373075259771,
1987
+ "eval_runtime": 419.8006,
1988
+ "eval_samples_per_second": 6.875,
1989
+ "eval_steps_per_second": 0.217,
1990
+ "learning_rate": 1e-05,
1991
+ "step": 32214
1992
+ },
1993
+ {
1994
+ "epoch": 119.0,
1995
+ "eval_accuracy": 0.28274428274428276,
1996
+ "eval_f1_macro": 0.6608406822787987,
1997
+ "eval_f1_micro": 0.8036745185622182,
1998
+ "eval_loss": 0.1319129317998886,
1999
+ "eval_roc_auc": 0.8678011174197509,
2000
+ "eval_runtime": 423.7674,
2001
+ "eval_samples_per_second": 6.81,
2002
+ "eval_steps_per_second": 0.215,
2003
+ "learning_rate": 1e-05,
2004
+ "step": 32487
2005
+ },
2006
+ {
2007
+ "epoch": 119.04761904761905,
2008
+ "grad_norm": 0.31120315194129944,
2009
+ "learning_rate": 1e-05,
2010
+ "loss": 0.1451,
2011
+ "step": 32500
2012
+ },
2013
+ {
2014
+ "epoch": 120.0,
2015
+ "eval_accuracy": 0.28135828135828134,
2016
+ "eval_f1_macro": 0.6614581971670047,
2017
+ "eval_f1_micro": 0.803593372600534,
2018
+ "eval_loss": 0.13164088129997253,
2019
+ "eval_roc_auc": 0.8661674020983411,
2020
+ "eval_runtime": 420.709,
2021
+ "eval_samples_per_second": 6.86,
2022
+ "eval_steps_per_second": 0.216,
2023
+ "learning_rate": 1e-05,
2024
+ "step": 32760
2025
+ },
2026
+ {
2027
+ "epoch": 120.87912087912088,
2028
+ "grad_norm": 0.31770700216293335,
2029
+ "learning_rate": 1e-05,
2030
+ "loss": 0.1454,
2031
+ "step": 33000
2032
+ },
2033
+ {
2034
+ "epoch": 121.0,
2035
+ "eval_accuracy": 0.28101178101178104,
2036
+ "eval_f1_macro": 0.6610641151618838,
2037
+ "eval_f1_micro": 0.8012604863092451,
2038
+ "eval_loss": 0.13184630870819092,
2039
+ "eval_roc_auc": 0.8635064611392681,
2040
+ "eval_runtime": 422.0264,
2041
+ "eval_samples_per_second": 6.838,
2042
+ "eval_steps_per_second": 0.216,
2043
+ "learning_rate": 1e-05,
2044
+ "step": 33033
2045
+ },
2046
+ {
2047
+ "epoch": 122.0,
2048
+ "eval_accuracy": 0.2817047817047817,
2049
+ "eval_f1_macro": 0.6647378818356079,
2050
+ "eval_f1_micro": 0.8049611099432415,
2051
+ "eval_loss": 0.13215216994285583,
2052
+ "eval_roc_auc": 0.8691576105910745,
2053
+ "eval_runtime": 436.9114,
2054
+ "eval_samples_per_second": 6.605,
2055
+ "eval_steps_per_second": 0.208,
2056
+ "learning_rate": 1e-05,
2057
+ "step": 33306
2058
+ },
2059
+ {
2060
+ "epoch": 122.71062271062272,
2061
+ "grad_norm": 0.22290275990962982,
2062
+ "learning_rate": 1e-05,
2063
+ "loss": 0.145,
2064
+ "step": 33500
2065
+ },
2066
+ {
2067
+ "epoch": 123.0,
2068
+ "eval_accuracy": 0.2817047817047817,
2069
+ "eval_f1_macro": 0.6604978306251739,
2070
+ "eval_f1_micro": 0.8010107932156931,
2071
+ "eval_loss": 0.13187836110591888,
2072
+ "eval_roc_auc": 0.8617537926061216,
2073
+ "eval_runtime": 431.3938,
2074
+ "eval_samples_per_second": 6.69,
2075
+ "eval_steps_per_second": 0.211,
2076
+ "learning_rate": 1e-05,
2077
+ "step": 33579
2078
+ },
2079
+ {
2080
+ "epoch": 124.0,
2081
+ "eval_accuracy": 0.2806652806652807,
2082
+ "eval_f1_macro": 0.6621515776947642,
2083
+ "eval_f1_micro": 0.8018739352640545,
2084
+ "eval_loss": 0.13141389191150665,
2085
+ "eval_roc_auc": 0.8638029186192627,
2086
+ "eval_runtime": 430.2675,
2087
+ "eval_samples_per_second": 6.707,
2088
+ "eval_steps_per_second": 0.211,
2089
+ "learning_rate": 1e-05,
2090
+ "step": 33852
2091
+ },
2092
+ {
2093
+ "epoch": 124.54212454212454,
2094
+ "grad_norm": 0.27631625533103943,
2095
+ "learning_rate": 1e-05,
2096
+ "loss": 0.1459,
2097
+ "step": 34000
2098
+ },
2099
+ {
2100
+ "epoch": 125.0,
2101
+ "eval_accuracy": 0.2862092862092862,
2102
+ "eval_f1_macro": 0.6640721616133445,
2103
+ "eval_f1_micro": 0.804345987993574,
2104
+ "eval_loss": 0.13139639794826508,
2105
+ "eval_roc_auc": 0.8672404491355638,
2106
+ "eval_runtime": 432.0509,
2107
+ "eval_samples_per_second": 6.68,
2108
+ "eval_steps_per_second": 0.211,
2109
+ "learning_rate": 1e-05,
2110
+ "step": 34125
2111
+ },
2112
+ {
2113
+ "epoch": 126.0,
2114
+ "eval_accuracy": 0.2862092862092862,
2115
+ "eval_f1_macro": 0.663003919720051,
2116
+ "eval_f1_micro": 0.804212663367593,
2117
+ "eval_loss": 0.13103623688220978,
2118
+ "eval_roc_auc": 0.8670350710768244,
2119
+ "eval_runtime": 432.4499,
2120
+ "eval_samples_per_second": 6.674,
2121
+ "eval_steps_per_second": 0.21,
2122
+ "learning_rate": 1e-05,
2123
+ "step": 34398
2124
+ },
2125
+ {
2126
+ "epoch": 126.37362637362638,
2127
+ "grad_norm": 0.3177105188369751,
2128
+ "learning_rate": 1e-05,
2129
+ "loss": 0.1439,
2130
+ "step": 34500
2131
+ },
2132
+ {
2133
+ "epoch": 127.0,
2134
+ "eval_accuracy": 0.28586278586278585,
2135
+ "eval_f1_macro": 0.6597731906072118,
2136
+ "eval_f1_micro": 0.8038346213944846,
2137
+ "eval_loss": 0.13152988255023956,
2138
+ "eval_roc_auc": 0.8672624342859965,
2139
+ "eval_runtime": 431.3827,
2140
+ "eval_samples_per_second": 6.69,
2141
+ "eval_steps_per_second": 0.211,
2142
+ "learning_rate": 1e-05,
2143
+ "step": 34671
2144
+ },
2145
+ {
2146
+ "epoch": 128.0,
2147
+ "eval_accuracy": 0.2869022869022869,
2148
+ "eval_f1_macro": 0.668197478893632,
2149
+ "eval_f1_micro": 0.8042412977357216,
2150
+ "eval_loss": 0.13113313913345337,
2151
+ "eval_roc_auc": 0.8674002874836755,
2152
+ "eval_runtime": 439.4627,
2153
+ "eval_samples_per_second": 6.567,
2154
+ "eval_steps_per_second": 0.207,
2155
+ "learning_rate": 1e-05,
2156
+ "step": 34944
2157
+ },
2158
+ {
2159
+ "epoch": 128.2051282051282,
2160
+ "grad_norm": 0.2520149350166321,
2161
+ "learning_rate": 1e-05,
2162
+ "loss": 0.1446,
2163
+ "step": 35000
2164
+ },
2165
+ {
2166
+ "epoch": 129.0,
2167
+ "eval_accuracy": 0.28274428274428276,
2168
+ "eval_f1_macro": 0.6652814888251478,
2169
+ "eval_f1_micro": 0.8034694309287074,
2170
+ "eval_loss": 0.13096605241298676,
2171
+ "eval_roc_auc": 0.8665332355380903,
2172
+ "eval_runtime": 443.7844,
2173
+ "eval_samples_per_second": 6.503,
2174
+ "eval_steps_per_second": 0.205,
2175
+ "learning_rate": 1e-05,
2176
+ "step": 35217
2177
+ },
2178
+ {
2179
+ "epoch": 130.0,
2180
+ "eval_accuracy": 0.28655578655578656,
2181
+ "eval_f1_macro": 0.6657375892895663,
2182
+ "eval_f1_micro": 0.8034491503931017,
2183
+ "eval_loss": 0.1310083270072937,
2184
+ "eval_roc_auc": 0.866799015752045,
2185
+ "eval_runtime": 440.6588,
2186
+ "eval_samples_per_second": 6.549,
2187
+ "eval_steps_per_second": 0.207,
2188
+ "learning_rate": 1e-05,
2189
+ "step": 35490
2190
+ },
2191
+ {
2192
+ "epoch": 130.03663003663004,
2193
+ "grad_norm": 0.2916598916053772,
2194
+ "learning_rate": 1e-05,
2195
+ "loss": 0.1449,
2196
+ "step": 35500
2197
+ },
2198
+ {
2199
+ "epoch": 131.0,
2200
+ "eval_accuracy": 0.2834372834372834,
2201
+ "eval_f1_macro": 0.6709132204127336,
2202
+ "eval_f1_micro": 0.8052362171687506,
2203
+ "eval_loss": 0.13133247196674347,
2204
+ "eval_roc_auc": 0.8699004377177725,
2205
+ "eval_runtime": 446.7612,
2206
+ "eval_samples_per_second": 6.46,
2207
+ "eval_steps_per_second": 0.204,
2208
+ "learning_rate": 1e-05,
2209
+ "step": 35763
2210
+ },
2211
+ {
2212
+ "epoch": 131.86813186813185,
2213
+ "grad_norm": 0.3473760783672333,
2214
+ "learning_rate": 1e-05,
2215
+ "loss": 0.1442,
2216
+ "step": 36000
2217
+ },
2218
+ {
2219
+ "epoch": 132.0,
2220
+ "eval_accuracy": 0.2806652806652807,
2221
+ "eval_f1_macro": 0.6557913726655867,
2222
+ "eval_f1_micro": 0.7985562048814026,
2223
+ "eval_loss": 0.13149647414684296,
2224
+ "eval_roc_auc": 0.8595249758820619,
2225
+ "eval_runtime": 447.0484,
2226
+ "eval_samples_per_second": 6.456,
2227
+ "eval_steps_per_second": 0.204,
2228
+ "learning_rate": 1e-05,
2229
+ "step": 36036
2230
+ },
2231
+ {
2232
+ "epoch": 133.0,
2233
+ "eval_accuracy": 0.28794178794178793,
2234
+ "eval_f1_macro": 0.6689392948255155,
2235
+ "eval_f1_micro": 0.8051816958277256,
2236
+ "eval_loss": 0.1311328113079071,
2237
+ "eval_roc_auc": 0.8691700049040701,
2238
+ "eval_runtime": 444.1217,
2239
+ "eval_samples_per_second": 6.498,
2240
+ "eval_steps_per_second": 0.205,
2241
+ "learning_rate": 1e-05,
2242
+ "step": 36309
2243
+ },
2244
+ {
2245
+ "epoch": 133.6996336996337,
2246
+ "grad_norm": 0.2959079444408417,
2247
+ "learning_rate": 1e-05,
2248
+ "loss": 0.1443,
2249
+ "step": 36500
2250
+ },
2251
+ {
2252
+ "epoch": 134.0,
2253
+ "eval_accuracy": 0.28274428274428276,
2254
+ "eval_f1_macro": 0.6648386499372343,
2255
+ "eval_f1_micro": 0.802060714437774,
2256
+ "eval_loss": 0.1308571696281433,
2257
+ "eval_roc_auc": 0.8639881626262637,
2258
+ "eval_runtime": 444.917,
2259
+ "eval_samples_per_second": 6.487,
2260
+ "eval_steps_per_second": 0.205,
2261
+ "learning_rate": 1e-05,
2262
+ "step": 36582
2263
+ },
2264
+ {
2265
+ "epoch": 135.0,
2266
+ "eval_accuracy": 0.2869022869022869,
2267
+ "eval_f1_macro": 0.6684163123065296,
2268
+ "eval_f1_micro": 0.8038277511961722,
2269
+ "eval_loss": 0.13148072361946106,
2270
+ "eval_roc_auc": 0.8665118674205556,
2271
+ "eval_runtime": 437.5153,
2272
+ "eval_samples_per_second": 6.596,
2273
+ "eval_steps_per_second": 0.208,
2274
+ "learning_rate": 1e-05,
2275
+ "step": 36855
2276
+ },
2277
+ {
2278
+ "epoch": 135.53113553113553,
2279
+ "grad_norm": 0.3723543882369995,
2280
+ "learning_rate": 1e-05,
2281
+ "loss": 0.1438,
2282
+ "step": 37000
2283
+ },
2284
+ {
2285
+ "epoch": 136.0,
2286
+ "eval_accuracy": 0.28274428274428276,
2287
+ "eval_f1_macro": 0.659009971789042,
2288
+ "eval_f1_micro": 0.8024591213764248,
2289
+ "eval_loss": 0.13150115311145782,
2290
+ "eval_roc_auc": 0.8634352340808195,
2291
+ "eval_runtime": 444.5109,
2292
+ "eval_samples_per_second": 6.493,
2293
+ "eval_steps_per_second": 0.205,
2294
+ "learning_rate": 1e-05,
2295
+ "step": 37128
2296
+ },
2297
+ {
2298
+ "epoch": 137.0,
2299
+ "eval_accuracy": 0.28586278586278585,
2300
+ "eval_f1_macro": 0.6666808903899752,
2301
+ "eval_f1_micro": 0.8035592643051771,
2302
+ "eval_loss": 0.1310679018497467,
2303
+ "eval_roc_auc": 0.8648124783367798,
2304
+ "eval_runtime": 434.2661,
2305
+ "eval_samples_per_second": 6.646,
2306
+ "eval_steps_per_second": 0.21,
2307
+ "learning_rate": 1e-05,
2308
+ "step": 37401
2309
+ },
2310
+ {
2311
+ "epoch": 137.36263736263737,
2312
+ "grad_norm": 0.36766815185546875,
2313
+ "learning_rate": 1e-05,
2314
+ "loss": 0.1452,
2315
+ "step": 37500
2316
+ },
2317
+ {
2318
+ "epoch": 138.0,
2319
+ "eval_accuracy": 0.2844767844767845,
2320
+ "eval_f1_macro": 0.6665598962110765,
2321
+ "eval_f1_micro": 0.8035426731078905,
2322
+ "eval_loss": 0.13124705851078033,
2323
+ "eval_roc_auc": 0.8661277510277622,
2324
+ "eval_runtime": 434.1413,
2325
+ "eval_samples_per_second": 6.648,
2326
+ "eval_steps_per_second": 0.21,
2327
+ "learning_rate": 1e-05,
2328
+ "step": 37674
2329
+ },
2330
+ {
2331
+ "epoch": 139.0,
2332
+ "eval_accuracy": 0.28967428967428965,
2333
+ "eval_f1_macro": 0.6661043989752415,
2334
+ "eval_f1_micro": 0.8052538519828238,
2335
+ "eval_loss": 0.13104070723056793,
2336
+ "eval_roc_auc": 0.8689438757606943,
2337
+ "eval_runtime": 433.2581,
2338
+ "eval_samples_per_second": 6.661,
2339
+ "eval_steps_per_second": 0.21,
2340
+ "learning_rate": 1e-05,
2341
+ "step": 37947
2342
+ },
2343
+ {
2344
+ "epoch": 139.19413919413918,
2345
+ "grad_norm": 0.35373228788375854,
2346
+ "learning_rate": 1e-05,
2347
+ "loss": 0.144,
2348
+ "step": 38000
2349
+ },
2350
+ {
2351
+ "epoch": 140.0,
2352
+ "eval_accuracy": 0.2834372834372834,
2353
+ "eval_f1_macro": 0.663466069531375,
2354
+ "eval_f1_micro": 0.8020416843896214,
2355
+ "eval_loss": 0.13169734179973602,
2356
+ "eval_roc_auc": 0.8642539428402185,
2357
+ "eval_runtime": 435.0147,
2358
+ "eval_samples_per_second": 6.634,
2359
+ "eval_steps_per_second": 0.209,
2360
+ "learning_rate": 1e-05,
2361
+ "step": 38220
2362
+ },
2363
+ {
2364
+ "epoch": 141.0,
2365
+ "eval_accuracy": 0.2875952875952876,
2366
+ "eval_f1_macro": 0.6687691213000826,
2367
+ "eval_f1_micro": 0.8046521463311481,
2368
+ "eval_loss": 0.13089434802532196,
2369
+ "eval_roc_auc": 0.867299000192085,
2370
+ "eval_runtime": 429.8469,
2371
+ "eval_samples_per_second": 6.714,
2372
+ "eval_steps_per_second": 0.212,
2373
+ "learning_rate": 1.0000000000000002e-06,
2374
+ "step": 38493
2375
+ },
2376
+ {
2377
+ "epoch": 141.02564102564102,
2378
+ "grad_norm": 0.2815115451812744,
2379
+ "learning_rate": 1.0000000000000002e-06,
2380
+ "loss": 0.1445,
2381
+ "step": 38500
2382
+ },
2383
+ {
2384
+ "epoch": 142.0,
2385
+ "eval_accuracy": 0.28586278586278585,
2386
+ "eval_f1_macro": 0.6642894279153319,
2387
+ "eval_f1_micro": 0.8041640110473762,
2388
+ "eval_loss": 0.13103386759757996,
2389
+ "eval_roc_auc": 0.8657067870399482,
2390
+ "eval_runtime": 425.5573,
2391
+ "eval_samples_per_second": 6.782,
2392
+ "eval_steps_per_second": 0.214,
2393
+ "learning_rate": 1.0000000000000002e-06,
2394
+ "step": 38766
2395
+ },
2396
+ {
2397
+ "epoch": 142.85714285714286,
2398
+ "grad_norm": 0.3381010890007019,
2399
+ "learning_rate": 1.0000000000000002e-06,
2400
+ "loss": 0.1441,
2401
+ "step": 39000
2402
+ },
2403
+ {
2404
+ "epoch": 143.0,
2405
+ "eval_accuracy": 0.2872487872487873,
2406
+ "eval_f1_macro": 0.6623287859816251,
2407
+ "eval_f1_micro": 0.8019270122783083,
2408
+ "eval_loss": 0.13144278526306152,
2409
+ "eval_roc_auc": 0.8635436440782548,
2410
+ "eval_runtime": 433.7658,
2411
+ "eval_samples_per_second": 6.653,
2412
+ "eval_steps_per_second": 0.21,
2413
+ "learning_rate": 1.0000000000000002e-06,
2414
+ "step": 39039
2415
+ },
2416
+ {
2417
+ "epoch": 144.0,
2418
+ "eval_accuracy": 0.28378378378378377,
2419
+ "eval_f1_macro": 0.6647534218687892,
2420
+ "eval_f1_micro": 0.8024974515800204,
2421
+ "eval_loss": 0.1311902105808258,
2422
+ "eval_roc_auc": 0.8649097280870156,
2423
+ "eval_runtime": 446.8955,
2424
+ "eval_samples_per_second": 6.458,
2425
+ "eval_steps_per_second": 0.204,
2426
+ "learning_rate": 1.0000000000000002e-06,
2427
+ "step": 39312
2428
+ },
2429
+ {
2430
+ "epoch": 144.0,
2431
+ "learning_rate": 1.0000000000000002e-06,
2432
+ "step": 39312,
2433
+ "total_flos": 1.3598709030716368e+20,
2434
+ "train_loss": 0.157796386979584,
2435
+ "train_runtime": 249885.5342,
2436
+ "train_samples_per_second": 5.232,
2437
+ "train_steps_per_second": 0.164
2438
+ }
2439
+ ],
2440
+ "logging_steps": 500,
2441
+ "max_steps": 40950,
2442
+ "num_input_tokens_seen": 0,
2443
+ "num_train_epochs": 150,
2444
+ "save_steps": 500,
2445
+ "stateful_callbacks": {
2446
+ "EarlyStoppingCallback": {
2447
+ "args": {
2448
+ "early_stopping_patience": 10,
2449
+ "early_stopping_threshold": 0.0
2450
+ },
2451
+ "attributes": {
2452
+ "early_stopping_patience_counter": 0
2453
+ }
2454
+ },
2455
+ "TrainerControl": {
2456
+ "args": {
2457
+ "should_epoch_stop": false,
2458
+ "should_evaluate": false,
2459
+ "should_log": false,
2460
+ "should_save": true,
2461
+ "should_training_stop": true
2462
+ },
2463
+ "attributes": {}
2464
+ }
2465
+ },
2466
+ "total_flos": 1.3598709030716368e+20,
2467
+ "train_batch_size": 32,
2468
+ "trial_name": null,
2469
+ "trial_params": null
2470
+ }