lapp0 commited on
Commit
129ef97
·
verified ·
1 Parent(s): f640ad0

End of training

Browse files
README.md CHANGED
@@ -16,14 +16,14 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
- - eval_enwikippl: 210.0
20
- - eval_frwikippl: 776.0
21
- - eval_zhwikippl: 169.0
22
- - eval_tinystoriesppl: 166.0
23
- - eval_loss: 1.2670
24
- - eval_runtime: 25.479
25
- - eval_samples_per_second: 98.12
26
- - eval_steps_per_second: 12.285
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
  should probably proofread and complete it, then remove this comment.
@@ -46,15 +46,15 @@ More information needed
46
  ### Training hyperparameters
47
 
48
  The following hyperparameters were used during training:
49
- - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
50
  - train_embeddings: True
51
  - learning_rate: 0.0001
52
  - train_batch_size: 4
53
  - eval_batch_size: 8
54
  - seed: 42
55
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
56
- - lr_scheduler_type: inverse_sqrt
57
- - lr_scheduler_warmup_ratio: 0.5
58
  - num_epochs: 1.0
59
 
60
  ### Resource Usage
@@ -64,69 +64,106 @@ Peak GPU Memory: 7.2012 GB
64
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
65
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
66
  | **teacher eval** | | 43.25 | 61.25 | | | | | 11.6875 | 19.125 |
67
- | 0 | 0 | 1477468749824.0 | 128093104635904.0 | 19.0496 | 25.4251 | 98.328 | 12.311 | 7985954816.0 | 67345087201280.0 |
68
- | 1000 | 0.0162 | 73728.0 | 1736704.0 | 5.1613 | 25.443 | 98.259 | 12.302 | 15872.0 | 3735552.0 |
69
- | 2000 | 0.0323 | 4000.0 | 42752.0 | 3.2656 | 25.4471 | 98.243 | 12.3 | 2688.0 | 544768.0 |
70
- | 3000 | 0.0485 | 1120.0 | 8160.0 | 2.3670 | 25.4374 | 98.28 | 12.305 | 728.0 | 16192.0 |
71
- | 4000 | 0.0646 | 620.0 | 4512.0 | 1.9891 | 25.4378 | 98.279 | 12.305 | 410.0 | 1464.0 |
72
- | 5000 | 0.0808 | 422.0 | 2080.0 | 1.7081 | 25.4876 | 98.087 | 12.28 | 286.0 | 372.0 |
73
- | 6000 | 0.0970 | 312.0 | 1296.0 | 1.4979 | 25.4407 | 98.268 | 12.303 | 231.0 | 237.0 |
74
- | 7000 | 0.1131 | 262.0 | 996.0 | 1.4032 | 25.4618 | 98.186 | 12.293 | 209.0 | 193.0 |
75
- | 8000 | 0.1293 | 226.0 | 860.0 | 1.3139 | 25.4752 | 98.135 | 12.286 | 179.0 | 221.0 |
76
- | 9000 | 0.1455 | 210.0 | 776.0 | 1.2670 | 25.479 | 98.12 | 12.285 | 166.0 | 169.0 |
77
- | 10000 | 0.1616 | 186.0 | 652.0 | 1.1738 | 25.4466 | 98.245 | 12.3 | 153.0 | 147.0 |
78
- | 11000 | 0.1778 | 167.0 | 568.0 | 1.0682 | 25.5929 | 97.683 | 12.23 | 145.0 | 149.0 |
79
- | 12000 | 0.1939 | 205.0 | 576.0 | 0.9983 | 25.4381 | 98.278 | 12.304 | 190.0 | 141.0 |
80
- | 13000 | 0.2101 | 141.0 | 472.0 | 0.9487 | 25.4558 | 98.21 | 12.296 | 118.0 | 132.0 |
81
- | 14000 | 0.2263 | 139.0 | 478.0 | 0.8964 | 25.4863 | 98.092 | 12.281 | 113.5 | 145.0 |
82
- | 15000 | 0.2424 | 129.0 | 458.0 | 0.8617 | 25.5441 | 97.87 | 12.253 | 98.5 | 119.0 |
83
- | 16000 | 0.2586 | 116.0 | 430.0 | 0.8348 | 25.4844 | 98.099 | 12.282 | 94.5 | 109.0 |
84
- | 17000 | 0.2747 | 120.0 | 452.0 | 0.8054 | 25.4588 | 98.198 | 12.294 | 86.5 | 110.0 |
85
- | 18000 | 0.2909 | 102.0 | 380.0 | 0.7811 | 25.4713 | 98.15 | 12.288 | 82.5 | 104.0 |
86
- | 19000 | 0.3071 | 108.5 | 408.0 | 0.8030 | 25.4558 | 98.21 | 12.296 | 86.0 | 111.5 |
87
- | 20000 | 0.3232 | 94.0 | 366.0 | 0.7253 | 25.5141 | 97.985 | 12.268 | 74.0 | 123.5 |
88
- | 21000 | 0.3394 | 87.5 | 322.0 | 0.6815 | 25.4799 | 98.117 | 12.284 | 70.0 | 142.0 |
89
- | 22000 | 0.3556 | 78.5 | 282.0 | 0.6363 | 25.4302 | 98.308 | 12.308 | 64.5 | 128.0 |
90
- | 23000 | 0.3717 | 77.0 | 270.0 | 0.6005 | 25.4991 | 98.043 | 12.275 | 61.25 | 110.5 |
91
- | 24000 | 0.3879 | 75.0 | 264.0 | 0.5879 | 25.4595 | 98.195 | 12.294 | 65.5 | 109.5 |
92
- | 25000 | 0.4040 | 70.5 | 235.0 | 0.5705 | 25.5194 | 97.965 | 12.265 | 58.0 | 140.0 |
93
- | 26000 | 0.4202 | 72.5 | 229.0 | 0.5606 | 25.5376 | 97.895 | 12.256 | 57.5 | 131.0 |
94
- | 27000 | 0.4364 | 71.5 | 236.0 | 0.5460 | 25.5324 | 97.915 | 12.259 | 56.25 | 120.0 |
95
- | 28000 | 0.4525 | 70.0 | 238.0 | 0.5403 | 25.5379 | 97.894 | 12.256 | 56.0 | 111.0 |
96
- | 29000 | 0.4687 | 76.5 | 249.0 | 0.5596 | 25.5341 | 97.908 | 12.258 | 58.25 | 113.5 |
97
- | 30000 | 0.4848 | 72.0 | 222.0 | 0.5487 | 25.5124 | 97.992 | 12.269 | 54.0 | 162.0 |
98
- | 31000 | 0.5010 | 70.5 | 253.0 | 0.5503 | 25.4372 | 98.281 | 12.305 | 56.0 | 136.0 |
99
- | 32000 | 0.5172 | 72.5 | 247.0 | 0.5404 | 25.5107 | 97.998 | 12.269 | 53.75 | 117.0 |
100
- | 33000 | 0.5333 | 70.0 | 235.0 | 0.5380 | 25.5177 | 97.971 | 12.266 | 51.25 | 127.0 |
101
- | 34000 | 0.5495 | 71.5 | 251.0 | 0.5315 | 25.5159 | 97.978 | 12.267 | 54.25 | 141.0 |
102
- | 35000 | 0.5657 | 73.5 | 247.0 | 0.5319 | 25.4405 | 98.268 | 12.303 | 55.75 | 98.0 |
103
- | 36000 | 0.5818 | 71.5 | 231.0 | 0.5224 | 25.4323 | 98.3 | 12.307 | 53.0 | 147.0 |
104
- | 37000 | 0.5980 | 69.0 | 237.0 | 0.5154 | 25.4606 | 98.191 | 12.294 | 53.0 | 115.5 |
105
- | 38000 | 0.6141 | 69.5 | 224.0 | 0.5031 | 25.5405 | 97.884 | 12.255 | 51.0 | 98.0 |
106
- | 39000 | 0.6303 | 64.5 | 219.0 | 0.4864 | 25.4803 | 98.115 | 12.284 | 50.0 | 458.0 |
107
- | 40000 | 0.6465 | 67.0 | 214.0 | 0.4865 | 25.5475 | 97.857 | 12.252 | 53.0 | 110.5 |
108
- | 41000 | 0.6626 | 67.5 | 218.0 | 0.4886 | 25.4575 | 98.203 | 12.295 | 47.5 | 95.0 |
109
- | 42000 | 0.6788 | 63.25 | 210.0 | 0.4788 | 25.43 | 98.309 | 12.308 | 51.0 | 92.0 |
110
- | 43000 | 0.6949 | 63.0 | 203.0 | 0.4746 | 25.4807 | 98.113 | 12.284 | 48.25 | 84.0 |
111
- | 44000 | 0.7111 | 62.75 | 211.0 | 0.4785 | 25.4599 | 98.194 | 12.294 | 48.5 | 90.0 |
112
- | 45000 | 0.7273 | 63.5 | 205.0 | 0.4745 | 25.4377 | 98.279 | 12.305 | 47.25 | 92.5 |
113
- | 46000 | 0.7434 | 63.25 | 204.0 | 0.4697 | 25.5287 | 97.929 | 12.261 | 45.5 | 116.0 |
114
- | 47000 | 0.7596 | 63.75 | 229.0 | 0.4705 | 25.4923 | 98.069 | 12.278 | 49.0 | 101.5 |
115
- | 48000 | 0.7758 | 63.75 | 205.0 | 0.4669 | 25.4951 | 98.058 | 12.277 | 46.0 | 75.5 |
116
- | 49000 | 0.7919 | 62.75 | 200.0 | 0.4651 | 25.5061 | 98.016 | 12.272 | 49.75 | 87.0 |
117
- | 50000 | 0.8081 | 63.25 | 200.0 | 0.4666 | 25.5442 | 97.869 | 12.253 | 46.5 | 149.0 |
118
- | 51000 | 0.8242 | 62.75 | 191.0 | 0.4644 | 25.4188 | 98.353 | 12.314 | 46.5 | 85.0 |
119
- | 52000 | 0.8404 | 61.25 | 187.0 | 0.4531 | 25.5074 | 98.011 | 12.271 | 47.25 | 62.75 |
120
- | 53000 | 0.8566 | 60.25 | 207.0 | 0.4613 | 25.4158 | 98.364 | 12.315 | 44.5 | 129.0 |
121
- | 54000 | 0.8727 | 65.0 | 209.0 | 0.4588 | 25.4519 | 98.225 | 12.298 | 45.5 | 95.0 |
122
- | 55000 | 0.8889 | 60.25 | 196.0 | 0.4571 | 25.4656 | 98.172 | 12.291 | 45.5 | 75.5 |
123
- | 56000 | 0.9051 | 63.75 | 195.0 | 0.4519 | 25.5712 | 97.766 | 12.24 | 46.0 | 82.0 |
124
- | 57000 | 0.9212 | 65.5 | 212.0 | 0.4563 | 25.4339 | 98.294 | 12.306 | 46.0 | 100.0 |
125
- | 58000 | 0.9374 | 62.5 | 204.0 | 0.4474 | 25.4328 | 98.298 | 12.307 | 45.25 | 80.5 |
126
- | 59000 | 0.9535 | 62.0 | 214.0 | 0.4461 | 25.4201 | 98.347 | 12.313 | 44.0 | 85.0 |
127
- | 60000 | 0.9697 | 62.0 | 211.0 | 0.4494 | 25.5124 | 97.992 | 12.269 | 44.25 | 85.0 |
128
- | 61000 | 0.9859 | 62.0 | 219.0 | 0.4483 | 25.4698 | 98.155 | 12.289 | 46.0 | 62.75 |
129
- | 61875 | 1.0 | 61.25 | 203.0 | 0.4444 | 25.5209 | 97.959 | 12.264 | 45.5 | 100.0 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
130
 
131
  ### Framework versions
132
  - Distily 0.2.0
 
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
+ - eval_enwikippl: 65.0
20
+ - eval_frwikippl: 215.0
21
+ - eval_zhwikippl: 104.5
22
+ - eval_tinystoriesppl: 49.75
23
+ - eval_loss: 0.4281
24
+ - eval_runtime: 102.0824
25
+ - eval_samples_per_second: 97.96
26
+ - eval_steps_per_second: 12.245
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
  should probably proofread and complete it, then remove this comment.
 
46
  ### Training hyperparameters
47
 
48
  The following hyperparameters were used during training:
49
+ - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=mse, layer_mapper=last, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=mse, layer_mapper=layer-2, projector=None))
50
  - train_embeddings: True
51
  - learning_rate: 0.0001
52
  - train_batch_size: 4
53
  - eval_batch_size: 8
54
  - seed: 42
55
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
56
+ - lr_scheduler_type: constant
57
+ - lr_scheduler_warmup_ratio: 0.2
58
  - num_epochs: 1.0
59
 
60
  ### Resource Usage
 
64
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
65
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
66
  | **teacher eval** | | 43.25 | 61.25 | | | | | 11.6875 | 19.125 |
67
+ | 0 | 0 | 2611340115968.0 | 307863255777280.0 | 21.3930 | 101.7301 | 98.299 | 12.287 | 7214202880.0 | 36009005809664.0 |
68
+ | 2500 | 0.0101 | 191.0 | 704.0 | 1.0465 | 101.8446 | 98.189 | 12.274 | 165.0 | 316.0 |
69
+ | 5000 | 0.0202 | 131.0 | 492.0 | 0.8689 | 102.0266 | 98.014 | 12.252 | 111.5 | 148.0 |
70
+ | 7500 | 0.0303 | 114.5 | 396.0 | 0.7492 | 101.8482 | 98.185 | 12.273 | 92.0 | 142.0 |
71
+ | 10000 | 0.0404 | 97.5 | 380.0 | 0.6631 | 102.1266 | 97.918 | 12.24 | 76.5 | 136.0 |
72
+ | 12500 | 0.0505 | 86.5 | 314.0 | 0.5968 | 101.9077 | 98.128 | 12.266 | 72.0 | 146.0 |
73
+ | 15000 | 0.0606 | 80.5 | 302.0 | 0.5429 | 102.0007 | 98.039 | 12.255 | 66.0 | 135.0 |
74
+ | 17500 | 0.0707 | 77.0 | 276.0 | 0.5161 | 101.9305 | 98.106 | 12.263 | 62.75 | 121.0 |
75
+ | 20000 | 0.0808 | 74.5 | 262.0 | 0.5019 | 102.035 | 98.006 | 12.251 | 60.25 | 120.5 |
76
+ | 22500 | 0.0909 | 68.5 | 266.0 | 0.4874 | 101.8648 | 98.169 | 12.271 | 59.0 | 160.0 |
77
+ | 25000 | 0.1010 | 73.0 | 242.0 | 0.4754 | 102.1293 | 97.915 | 12.239 | 55.25 | 145.0 |
78
+ | 27500 | 0.1111 | 70.0 | 243.0 | 0.4627 | 102.0199 | 98.02 | 12.253 | 56.75 | 100.0 |
79
+ | 30000 | 0.1212 | 68.5 | 251.0 | 0.4621 | 102.0947 | 97.948 | 12.244 | 56.5 | 133.0 |
80
+ | 32500 | 0.1313 | 68.5 | 252.0 | 0.4589 | 102.0148 | 98.025 | 12.253 | 52.0 | 139.0 |
81
+ | 35000 | 0.1414 | 67.0 | 228.0 | 0.4628 | 101.8975 | 98.138 | 12.267 | 53.75 | 254.0 |
82
+ | 37500 | 0.1515 | 68.5 | 227.0 | 0.4495 | 101.9205 | 98.116 | 12.264 | 53.0 | 130.0 |
83
+ | 40000 | 0.1616 | 72.0 | 270.0 | 0.4502 | 101.8861 | 98.149 | 12.269 | 56.25 | 104.0 |
84
+ | 42500 | 0.1717 | 68.0 | 238.0 | 0.4422 | 101.8871 | 98.148 | 12.268 | 54.25 | 167.0 |
85
+ | 45000 | 0.1818 | 68.5 | 260.0 | 0.4498 | 101.9466 | 98.091 | 12.261 | 54.0 | 108.5 |
86
+ | 47500 | 0.1919 | 69.0 | 229.0 | 0.4392 | 102.1911 | 97.856 | 12.232 | 49.25 | 113.0 |
87
+ | 50000 | 0.2020 | 71.5 | 247.0 | 0.4473 | 104.3621 | 95.82 | 11.978 | 51.75 | 86.5 |
88
+ | 52500 | 0.2121 | 72.5 | 233.0 | 0.4357 | 102.8199 | 97.257 | 12.157 | 53.25 | 147.0 |
89
+ | 55000 | 0.2222 | 76.5 | 223.0 | 0.4321 | 102.1001 | 97.943 | 12.243 | 51.0 | 88.5 |
90
+ | 57500 | 0.2323 | 75.5 | 238.0 | 0.4342 | 102.0258 | 98.014 | 12.252 | 54.75 | 115.0 |
91
+ | 60000 | 0.2424 | 73.5 | 250.0 | 0.4374 | 101.9687 | 98.069 | 12.259 | 51.25 | 153.0 |
92
+ | 62500 | 0.2525 | 67.0 | 225.0 | 0.4252 | 101.9203 | 98.116 | 12.264 | 50.25 | 145.0 |
93
+ | 65000 | 0.2626 | 70.0 | 224.0 | 0.4304 | 101.9468 | 98.09 | 12.261 | 48.75 | 128.0 |
94
+ | 67500 | 0.2727 | 67.5 | 208.0 | 0.4303 | 102.0608 | 97.981 | 12.248 | 55.25 | 166.0 |
95
+ | 70000 | 0.2828 | 70.5 | 231.0 | 0.4263 | 101.9988 | 98.04 | 12.255 | 52.75 | 115.5 |
96
+ | 72500 | 0.2929 | 65.5 | 230.0 | 0.4249 | 102.2665 | 97.784 | 12.223 | 54.25 | 128.0 |
97
+ | 75000 | 0.3030 | 68.0 | 243.0 | 0.4279 | 102.0312 | 98.009 | 12.251 | 49.75 | 125.5 |
98
+ | 77500 | 0.3131 | 67.5 | 222.0 | 0.4326 | 102.1256 | 97.919 | 12.24 | 52.0 | 121.5 |
99
+ | 80000 | 0.3232 | 65.5 | 222.0 | 0.4254 | 101.9985 | 98.041 | 12.255 | 48.5 | 133.0 |
100
+ | 82500 | 0.3333 | 68.0 | 230.0 | 0.4219 | 102.0083 | 98.031 | 12.254 | 52.0 | 111.5 |
101
+ | 85000 | 0.3434 | 67.0 | 222.0 | 0.4243 | 102.057 | 97.984 | 12.248 | 48.25 | 109.5 |
102
+ | 87500 | 0.3535 | 66.5 | 218.0 | 0.4240 | 101.9819 | 98.057 | 12.257 | 53.5 | 302.0 |
103
+ | 90000 | 0.3636 | 66.5 | 229.0 | 0.4250 | 102.0841 | 97.958 | 12.245 | 50.0 | 118.0 |
104
+ | 92500 | 0.3737 | 67.0 | 227.0 | 0.4239 | 102.0958 | 97.947 | 12.243 | 53.0 | 114.0 |
105
+ | 95000 | 0.3838 | 67.5 | 240.0 | 0.4257 | 101.9889 | 98.05 | 12.256 | 50.75 | 110.0 |
106
+ | 97500 | 0.3939 | 65.0 | 215.0 | 0.4281 | 102.0824 | 97.96 | 12.245 | 49.75 | 104.5 |
107
+ | 100000 | 0.4040 | 67.5 | 230.0 | 0.4203 | 102.3463 | 97.707 | 12.213 | 50.5 | 115.0 |
108
+ | 102500 | 0.4141 | 66.0 | 227.0 | 0.4239 | 102.8008 | 97.276 | 12.159 | 53.25 | 109.0 |
109
+ | 105000 | 0.4242 | 66.5 | 219.0 | 0.4249 | 102.6156 | 97.451 | 12.181 | 51.75 | 159.0 |
110
+ | 107500 | 0.4343 | 65.5 | 218.0 | 0.4209 | 102.6016 | 97.464 | 12.183 | 51.5 | 95.0 |
111
+ | 110000 | 0.4444 | 66.5 | 227.0 | 0.4213 | 102.437 | 97.621 | 12.203 | 52.0 | 130.0 |
112
+ | 112500 | 0.4545 | 67.5 | 211.0 | 0.4231 | 102.6961 | 97.375 | 12.172 | 49.5 | 145.0 |
113
+ | 115000 | 0.4646 | 66.0 | 209.0 | 0.4215 | 102.1356 | 97.909 | 12.239 | 48.25 | 126.5 |
114
+ | 117500 | 0.4747 | 66.5 | 228.0 | 0.4261 | 102.5136 | 97.548 | 12.194 | 48.25 | 104.0 |
115
+ | 120000 | 0.4848 | 68.5 | 238.0 | 0.4239 | 102.325 | 97.728 | 12.216 | 50.5 | 212.0 |
116
+ | 122500 | 0.4949 | 67.0 | 219.0 | 0.4203 | 102.8823 | 97.198 | 12.15 | 52.0 | 94.5 |
117
+ | 125000 | 0.5051 | 66.5 | 249.0 | 0.4220 | 102.285 | 97.766 | 12.221 | 51.0 | 129.0 |
118
+ | 127500 | 0.5152 | 65.0 | 226.0 | 0.4242 | 102.4487 | 97.61 | 12.201 | 49.0 | 76.5 |
119
+ | 130000 | 0.5253 | 65.0 | 222.0 | 0.4206 | 102.615 | 97.452 | 12.181 | 51.5 | 106.0 |
120
+ | 132500 | 0.5354 | 63.5 | 232.0 | 0.4195 | 102.0382 | 98.002 | 12.25 | 49.0 | 115.0 |
121
+ | 135000 | 0.5455 | 65.0 | 239.0 | 0.4195 | 102.4661 | 97.593 | 12.199 | 50.75 | 83.5 |
122
+ | 137500 | 0.5556 | 69.0 | 232.0 | 0.4227 | 102.0828 | 97.96 | 12.245 | 52.25 | 133.0 |
123
+ | 140000 | 0.5657 | 66.0 | 206.0 | 0.4239 | 102.0497 | 97.991 | 12.249 | 55.0 | 148.0 |
124
+ | 142500 | 0.5758 | 65.5 | 218.0 | 0.4256 | 102.0522 | 97.989 | 12.249 | 50.25 | 144.0 |
125
+ | 145000 | 0.5859 | 65.0 | 227.0 | 0.4201 | 102.154 | 97.891 | 12.236 | 50.5 | 135.0 |
126
+ | 147500 | 0.5960 | 65.5 | 211.0 | 0.4216 | 102.1033 | 97.94 | 12.243 | 49.75 | 92.5 |
127
+ | 150000 | 0.6061 | 66.0 | 242.0 | 0.4288 | 102.1595 | 97.886 | 12.236 | 52.0 | 137.0 |
128
+ | 152500 | 0.6162 | 67.0 | 229.0 | 0.4180 | 102.5134 | 97.548 | 12.194 | 49.25 | 111.0 |
129
+ | 155000 | 0.6263 | 65.0 | 206.0 | 0.4224 | 102.3146 | 97.738 | 12.217 | 51.0 | 151.0 |
130
+ | 157500 | 0.6364 | 66.0 | 220.0 | 0.4266 | 102.1949 | 97.852 | 12.232 | 48.75 | 107.5 |
131
+ | 160000 | 0.6465 | 67.5 | 212.0 | 0.4226 | 102.1337 | 97.911 | 12.239 | 49.25 | 97.0 |
132
+ | 162500 | 0.6566 | 67.0 | 212.0 | 0.4186 | 102.1028 | 97.94 | 12.243 | 50.0 | 89.0 |
133
+ | 165000 | 0.6667 | 63.75 | 231.0 | 0.4159 | 101.9547 | 98.083 | 12.26 | 47.75 | 116.0 |
134
+ | 167500 | 0.6768 | 67.5 | 227.0 | 0.4208 | 102.0173 | 98.023 | 12.253 | 51.0 | 203.0 |
135
+ | 170000 | 0.6869 | 65.5 | 268.0 | 0.4194 | 101.9863 | 98.052 | 12.257 | 49.25 | 108.5 |
136
+ | 172500 | 0.6970 | 66.5 | 208.0 | 0.4165 | 102.0041 | 98.035 | 12.254 | 49.75 | 175.0 |
137
+ | 175000 | 0.7071 | 68.0 | 221.0 | 0.4189 | 102.0695 | 97.972 | 12.247 | 50.25 | 91.5 |
138
+ | 177500 | 0.7172 | 66.0 | 211.0 | 0.4188 | 101.9141 | 98.122 | 12.265 | 48.75 | 124.0 |
139
+ | 180000 | 0.7273 | 64.0 | 200.0 | 0.4169 | 102.2518 | 97.798 | 12.225 | 46.75 | 113.0 |
140
+ | 182500 | 0.7374 | 64.0 | 204.0 | 0.4204 | 102.0976 | 97.946 | 12.243 | 49.5 | 140.0 |
141
+ | 185000 | 0.7475 | 65.0 | 213.0 | 0.4152 | 102.3207 | 97.732 | 12.216 | 48.5 | 127.0 |
142
+ | 187500 | 0.7576 | 65.0 | 206.0 | 0.4155 | 102.1198 | 97.924 | 12.241 | 49.25 | 108.0 |
143
+ | 190000 | 0.7677 | 66.0 | 213.0 | 0.4182 | 102.192 | 97.855 | 12.232 | 49.25 | 130.0 |
144
+ | 192500 | 0.7778 | 68.0 | 221.0 | 0.4160 | 102.0413 | 98.0 | 12.25 | 53.75 | 143.0 |
145
+ | 195000 | 0.7879 | 66.5 | 225.0 | 0.4136 | 102.0553 | 97.986 | 12.248 | 53.0 | 164.0 |
146
+ | 197500 | 0.7980 | 65.5 | 218.0 | 0.4160 | 101.9027 | 98.133 | 12.267 | 49.0 | 89.0 |
147
+ | 200000 | 0.8081 | 66.0 | 225.0 | 0.4148 | 102.4437 | 97.615 | 12.202 | 48.0 | 105.5 |
148
+ | 202500 | 0.8182 | 66.5 | 208.0 | 0.4189 | 102.0449 | 97.996 | 12.25 | 49.0 | 131.0 |
149
+ | 205000 | 0.8283 | 66.0 | 217.0 | 0.4150 | 102.0719 | 97.97 | 12.246 | 51.5 | 186.0 |
150
+ | 207500 | 0.8384 | 69.5 | 254.0 | 0.4214 | 102.0931 | 97.95 | 12.244 | 51.25 | 153.0 |
151
+ | 210000 | 0.8485 | 67.0 | 216.0 | 0.4235 | 102.1471 | 97.898 | 12.237 | 49.75 | 121.5 |
152
+ | 212500 | 0.8586 | 65.5 | 216.0 | 0.4125 | 102.1121 | 97.932 | 12.241 | 48.5 | 120.0 |
153
+ | 215000 | 0.8687 | 65.5 | 225.0 | 0.4145 | 102.235 | 97.814 | 12.227 | 49.0 | 105.5 |
154
+ | 217500 | 0.8788 | 69.0 | 264.0 | 0.4188 | 102.0807 | 97.962 | 12.245 | 49.5 | 209.0 |
155
+ | 220000 | 0.8889 | 68.0 | 218.0 | 0.4157 | 102.1941 | 97.853 | 12.232 | 50.5 | 107.0 |
156
+ | 222500 | 0.8990 | 65.5 | 212.0 | 0.4231 | 102.2725 | 97.778 | 12.222 | 51.5 | 201.0 |
157
+ | 225000 | 0.9091 | 65.0 | 229.0 | 0.4150 | 102.3034 | 97.748 | 12.219 | 50.25 | 137.0 |
158
+ | 227500 | 0.9192 | 65.0 | 218.0 | 0.4131 | 102.2051 | 97.842 | 12.23 | 48.75 | 131.0 |
159
+ | 230000 | 0.9293 | 64.5 | 217.0 | 0.4264 | 102.0499 | 97.991 | 12.249 | 48.5 | 148.0 |
160
+ | 232500 | 0.9394 | 68.0 | 232.0 | 0.4168 | 102.1883 | 97.859 | 12.232 | 49.0 | 107.0 |
161
+ | 235000 | 0.9495 | 69.0 | 233.0 | 0.4199 | 102.0331 | 98.007 | 12.251 | 46.75 | 157.0 |
162
+ | 237500 | 0.9596 | 65.0 | 212.0 | 0.4194 | 102.1218 | 97.922 | 12.24 | 52.25 | 150.0 |
163
+ | 240000 | 0.9697 | 65.5 | 215.0 | 0.4144 | 102.0228 | 98.017 | 12.252 | 48.75 | 182.0 |
164
+ | 242500 | 0.9798 | 65.0 | 209.0 | 0.4203 | 102.0484 | 97.993 | 12.249 | 49.5 | 95.5 |
165
+ | 245000 | 0.9899 | 65.0 | 266.0 | 0.4097 | 102.0961 | 97.947 | 12.243 | 49.5 | 111.0 |
166
+ | 247500 | 1.0 | 64.5 | 222.0 | 0.4159 | 102.373 | 97.682 | 12.21 | 49.75 | 81.5 |
167
 
168
  ### Framework versions
169
  - Distily 0.2.0
logs/dataset_sample_size=1000000/events.out.tfevents.1724188011.f383272e719b ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2633d80715b7195325acb3e4c65fed74cc2b66845609edaade7eed588ca957e4
3
+ size 36935941
logs/dataset_sample_size=1000000/events.out.tfevents.1724208622.f383272e719b ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0569921f8918952c376586118c2aa8dd4b19a51353cc99d88485b407853fb56f
3
+ size 118959234
logs/dataset_sample_size=1000000/events.out.tfevents.1724245927.f383272e719b ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2de43004f97294c8bfd6fa5696ac01ca02dfd6139b148ac8134e4087b052c61
3
+ size 588
logs/lr_scheduler_type=inverse_sqrt, warmup_ratio=0.5/completed.flag ADDED
File without changes
logs/lr_scheduler_type=inverse_sqrt, warmup_ratio=0.5/events.out.tfevents.1724187476.f383272e719b CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4538f4c6cdaacdc1cd047c19ca528f739cfdc3fa8eacb6891952317ee1b9bd38
3
- size 312
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06d95d6065ff2134dff53afa6caf2568c8ef678d2086b62b9a129427f6d4af69
3
+ size 588
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:17fee6a987417fe3203b6253069f8315c7a7aa5b55187fe9221afa0cb9e87b3a
3
  size 163832792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b5d5bbcb6f904ac723ec34bc104efb7630c4b3ecb460d79543147ac527e394a5
3
  size 163832792
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4e6ed2431e2081dbc96c461efc4ec618038e65d8b45c28f2ff8fbf1b2ae54c24
3
- size 1017899080
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1eba2c4d23bb73a09ccbdd2dd436f2cc031d3b2446586620979f29623af8edc2
3
+ size 1017899016