End of training
Browse files- README.md +111 -74
- logs/dataset_sample_size=1000000/events.out.tfevents.1724188011.f383272e719b +3 -0
- logs/dataset_sample_size=1000000/events.out.tfevents.1724208622.f383272e719b +3 -0
- logs/dataset_sample_size=1000000/events.out.tfevents.1724245927.f383272e719b +3 -0
- logs/lr_scheduler_type=inverse_sqrt, warmup_ratio=0.5/completed.flag +0 -0
- logs/lr_scheduler_type=inverse_sqrt, warmup_ratio=0.5/events.out.tfevents.1724187476.f383272e719b +2 -2
- model.safetensors +1 -1
- training_args.bin +2 -2
README.md
CHANGED
@@ -16,14 +16,14 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
|
|
16 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
17 |
|
18 |
It achieves the following results on the evaluation set:
|
19 |
-
- eval_enwikippl:
|
20 |
-
- eval_frwikippl:
|
21 |
-
- eval_zhwikippl:
|
22 |
-
- eval_tinystoriesppl:
|
23 |
-
- eval_loss:
|
24 |
-
- eval_runtime:
|
25 |
-
- eval_samples_per_second:
|
26 |
-
- eval_steps_per_second: 12.
|
27 |
|
28 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
29 |
should probably proofread and complete it, then remove this comment.
|
@@ -46,15 +46,15 @@ More information needed
|
|
46 |
### Training hyperparameters
|
47 |
|
48 |
The following hyperparameters were used during training:
|
49 |
-
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=
|
50 |
- train_embeddings: True
|
51 |
- learning_rate: 0.0001
|
52 |
- train_batch_size: 4
|
53 |
- eval_batch_size: 8
|
54 |
- seed: 42
|
55 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
56 |
-
- lr_scheduler_type:
|
57 |
-
- lr_scheduler_warmup_ratio: 0.
|
58 |
- num_epochs: 1.0
|
59 |
|
60 |
### Resource Usage
|
@@ -64,69 +64,106 @@ Peak GPU Memory: 7.2012 GB
|
|
64 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
65 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
66 |
| **teacher eval** | | 43.25 | 61.25 | | | | | 11.6875 | 19.125 |
|
67 |
-
| 0 | 0 |
|
68 |
-
|
|
69 |
-
|
|
70 |
-
|
|
71 |
-
|
|
72 |
-
|
|
73 |
-
|
|
74 |
-
|
|
75 |
-
|
|
76 |
-
|
|
77 |
-
|
|
78 |
-
|
|
79 |
-
|
|
80 |
-
|
|
81 |
-
|
|
82 |
-
|
|
83 |
-
|
|
84 |
-
|
|
85 |
-
|
|
86 |
-
|
|
87 |
-
|
|
88 |
-
|
|
89 |
-
|
|
90 |
-
|
|
91 |
-
|
|
92 |
-
|
|
93 |
-
|
|
94 |
-
|
|
95 |
-
|
|
96 |
-
|
|
97 |
-
|
|
98 |
-
|
|
99 |
-
|
|
100 |
-
|
|
101 |
-
|
|
102 |
-
|
|
103 |
-
|
|
104 |
-
|
|
105 |
-
|
|
106 |
-
|
|
107 |
-
|
|
108 |
-
|
|
109 |
-
|
|
110 |
-
|
|
111 |
-
|
|
112 |
-
|
|
113 |
-
|
|
114 |
-
|
|
115 |
-
|
|
116 |
-
|
|
117 |
-
|
|
118 |
-
|
|
119 |
-
|
|
120 |
-
|
|
121 |
-
|
|
122 |
-
|
|
123 |
-
|
|
124 |
-
|
|
125 |
-
|
|
126 |
-
|
|
127 |
-
|
|
128 |
-
|
|
129 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
130 |
|
131 |
### Framework versions
|
132 |
- Distily 0.2.0
|
|
|
16 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
17 |
|
18 |
It achieves the following results on the evaluation set:
|
19 |
+
- eval_enwikippl: 65.0
|
20 |
+
- eval_frwikippl: 215.0
|
21 |
+
- eval_zhwikippl: 104.5
|
22 |
+
- eval_tinystoriesppl: 49.75
|
23 |
+
- eval_loss: 0.4281
|
24 |
+
- eval_runtime: 102.0824
|
25 |
+
- eval_samples_per_second: 97.96
|
26 |
+
- eval_steps_per_second: 12.245
|
27 |
|
28 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
29 |
should probably proofread and complete it, then remove this comment.
|
|
|
46 |
### Training hyperparameters
|
47 |
|
48 |
The following hyperparameters were used during training:
|
49 |
+
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=mse, layer_mapper=last, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=mse, layer_mapper=layer-2, projector=None))
|
50 |
- train_embeddings: True
|
51 |
- learning_rate: 0.0001
|
52 |
- train_batch_size: 4
|
53 |
- eval_batch_size: 8
|
54 |
- seed: 42
|
55 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
56 |
+
- lr_scheduler_type: constant
|
57 |
+
- lr_scheduler_warmup_ratio: 0.2
|
58 |
- num_epochs: 1.0
|
59 |
|
60 |
### Resource Usage
|
|
|
64 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
65 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
66 |
| **teacher eval** | | 43.25 | 61.25 | | | | | 11.6875 | 19.125 |
|
67 |
+
| 0 | 0 | 2611340115968.0 | 307863255777280.0 | 21.3930 | 101.7301 | 98.299 | 12.287 | 7214202880.0 | 36009005809664.0 |
|
68 |
+
| 2500 | 0.0101 | 191.0 | 704.0 | 1.0465 | 101.8446 | 98.189 | 12.274 | 165.0 | 316.0 |
|
69 |
+
| 5000 | 0.0202 | 131.0 | 492.0 | 0.8689 | 102.0266 | 98.014 | 12.252 | 111.5 | 148.0 |
|
70 |
+
| 7500 | 0.0303 | 114.5 | 396.0 | 0.7492 | 101.8482 | 98.185 | 12.273 | 92.0 | 142.0 |
|
71 |
+
| 10000 | 0.0404 | 97.5 | 380.0 | 0.6631 | 102.1266 | 97.918 | 12.24 | 76.5 | 136.0 |
|
72 |
+
| 12500 | 0.0505 | 86.5 | 314.0 | 0.5968 | 101.9077 | 98.128 | 12.266 | 72.0 | 146.0 |
|
73 |
+
| 15000 | 0.0606 | 80.5 | 302.0 | 0.5429 | 102.0007 | 98.039 | 12.255 | 66.0 | 135.0 |
|
74 |
+
| 17500 | 0.0707 | 77.0 | 276.0 | 0.5161 | 101.9305 | 98.106 | 12.263 | 62.75 | 121.0 |
|
75 |
+
| 20000 | 0.0808 | 74.5 | 262.0 | 0.5019 | 102.035 | 98.006 | 12.251 | 60.25 | 120.5 |
|
76 |
+
| 22500 | 0.0909 | 68.5 | 266.0 | 0.4874 | 101.8648 | 98.169 | 12.271 | 59.0 | 160.0 |
|
77 |
+
| 25000 | 0.1010 | 73.0 | 242.0 | 0.4754 | 102.1293 | 97.915 | 12.239 | 55.25 | 145.0 |
|
78 |
+
| 27500 | 0.1111 | 70.0 | 243.0 | 0.4627 | 102.0199 | 98.02 | 12.253 | 56.75 | 100.0 |
|
79 |
+
| 30000 | 0.1212 | 68.5 | 251.0 | 0.4621 | 102.0947 | 97.948 | 12.244 | 56.5 | 133.0 |
|
80 |
+
| 32500 | 0.1313 | 68.5 | 252.0 | 0.4589 | 102.0148 | 98.025 | 12.253 | 52.0 | 139.0 |
|
81 |
+
| 35000 | 0.1414 | 67.0 | 228.0 | 0.4628 | 101.8975 | 98.138 | 12.267 | 53.75 | 254.0 |
|
82 |
+
| 37500 | 0.1515 | 68.5 | 227.0 | 0.4495 | 101.9205 | 98.116 | 12.264 | 53.0 | 130.0 |
|
83 |
+
| 40000 | 0.1616 | 72.0 | 270.0 | 0.4502 | 101.8861 | 98.149 | 12.269 | 56.25 | 104.0 |
|
84 |
+
| 42500 | 0.1717 | 68.0 | 238.0 | 0.4422 | 101.8871 | 98.148 | 12.268 | 54.25 | 167.0 |
|
85 |
+
| 45000 | 0.1818 | 68.5 | 260.0 | 0.4498 | 101.9466 | 98.091 | 12.261 | 54.0 | 108.5 |
|
86 |
+
| 47500 | 0.1919 | 69.0 | 229.0 | 0.4392 | 102.1911 | 97.856 | 12.232 | 49.25 | 113.0 |
|
87 |
+
| 50000 | 0.2020 | 71.5 | 247.0 | 0.4473 | 104.3621 | 95.82 | 11.978 | 51.75 | 86.5 |
|
88 |
+
| 52500 | 0.2121 | 72.5 | 233.0 | 0.4357 | 102.8199 | 97.257 | 12.157 | 53.25 | 147.0 |
|
89 |
+
| 55000 | 0.2222 | 76.5 | 223.0 | 0.4321 | 102.1001 | 97.943 | 12.243 | 51.0 | 88.5 |
|
90 |
+
| 57500 | 0.2323 | 75.5 | 238.0 | 0.4342 | 102.0258 | 98.014 | 12.252 | 54.75 | 115.0 |
|
91 |
+
| 60000 | 0.2424 | 73.5 | 250.0 | 0.4374 | 101.9687 | 98.069 | 12.259 | 51.25 | 153.0 |
|
92 |
+
| 62500 | 0.2525 | 67.0 | 225.0 | 0.4252 | 101.9203 | 98.116 | 12.264 | 50.25 | 145.0 |
|
93 |
+
| 65000 | 0.2626 | 70.0 | 224.0 | 0.4304 | 101.9468 | 98.09 | 12.261 | 48.75 | 128.0 |
|
94 |
+
| 67500 | 0.2727 | 67.5 | 208.0 | 0.4303 | 102.0608 | 97.981 | 12.248 | 55.25 | 166.0 |
|
95 |
+
| 70000 | 0.2828 | 70.5 | 231.0 | 0.4263 | 101.9988 | 98.04 | 12.255 | 52.75 | 115.5 |
|
96 |
+
| 72500 | 0.2929 | 65.5 | 230.0 | 0.4249 | 102.2665 | 97.784 | 12.223 | 54.25 | 128.0 |
|
97 |
+
| 75000 | 0.3030 | 68.0 | 243.0 | 0.4279 | 102.0312 | 98.009 | 12.251 | 49.75 | 125.5 |
|
98 |
+
| 77500 | 0.3131 | 67.5 | 222.0 | 0.4326 | 102.1256 | 97.919 | 12.24 | 52.0 | 121.5 |
|
99 |
+
| 80000 | 0.3232 | 65.5 | 222.0 | 0.4254 | 101.9985 | 98.041 | 12.255 | 48.5 | 133.0 |
|
100 |
+
| 82500 | 0.3333 | 68.0 | 230.0 | 0.4219 | 102.0083 | 98.031 | 12.254 | 52.0 | 111.5 |
|
101 |
+
| 85000 | 0.3434 | 67.0 | 222.0 | 0.4243 | 102.057 | 97.984 | 12.248 | 48.25 | 109.5 |
|
102 |
+
| 87500 | 0.3535 | 66.5 | 218.0 | 0.4240 | 101.9819 | 98.057 | 12.257 | 53.5 | 302.0 |
|
103 |
+
| 90000 | 0.3636 | 66.5 | 229.0 | 0.4250 | 102.0841 | 97.958 | 12.245 | 50.0 | 118.0 |
|
104 |
+
| 92500 | 0.3737 | 67.0 | 227.0 | 0.4239 | 102.0958 | 97.947 | 12.243 | 53.0 | 114.0 |
|
105 |
+
| 95000 | 0.3838 | 67.5 | 240.0 | 0.4257 | 101.9889 | 98.05 | 12.256 | 50.75 | 110.0 |
|
106 |
+
| 97500 | 0.3939 | 65.0 | 215.0 | 0.4281 | 102.0824 | 97.96 | 12.245 | 49.75 | 104.5 |
|
107 |
+
| 100000 | 0.4040 | 67.5 | 230.0 | 0.4203 | 102.3463 | 97.707 | 12.213 | 50.5 | 115.0 |
|
108 |
+
| 102500 | 0.4141 | 66.0 | 227.0 | 0.4239 | 102.8008 | 97.276 | 12.159 | 53.25 | 109.0 |
|
109 |
+
| 105000 | 0.4242 | 66.5 | 219.0 | 0.4249 | 102.6156 | 97.451 | 12.181 | 51.75 | 159.0 |
|
110 |
+
| 107500 | 0.4343 | 65.5 | 218.0 | 0.4209 | 102.6016 | 97.464 | 12.183 | 51.5 | 95.0 |
|
111 |
+
| 110000 | 0.4444 | 66.5 | 227.0 | 0.4213 | 102.437 | 97.621 | 12.203 | 52.0 | 130.0 |
|
112 |
+
| 112500 | 0.4545 | 67.5 | 211.0 | 0.4231 | 102.6961 | 97.375 | 12.172 | 49.5 | 145.0 |
|
113 |
+
| 115000 | 0.4646 | 66.0 | 209.0 | 0.4215 | 102.1356 | 97.909 | 12.239 | 48.25 | 126.5 |
|
114 |
+
| 117500 | 0.4747 | 66.5 | 228.0 | 0.4261 | 102.5136 | 97.548 | 12.194 | 48.25 | 104.0 |
|
115 |
+
| 120000 | 0.4848 | 68.5 | 238.0 | 0.4239 | 102.325 | 97.728 | 12.216 | 50.5 | 212.0 |
|
116 |
+
| 122500 | 0.4949 | 67.0 | 219.0 | 0.4203 | 102.8823 | 97.198 | 12.15 | 52.0 | 94.5 |
|
117 |
+
| 125000 | 0.5051 | 66.5 | 249.0 | 0.4220 | 102.285 | 97.766 | 12.221 | 51.0 | 129.0 |
|
118 |
+
| 127500 | 0.5152 | 65.0 | 226.0 | 0.4242 | 102.4487 | 97.61 | 12.201 | 49.0 | 76.5 |
|
119 |
+
| 130000 | 0.5253 | 65.0 | 222.0 | 0.4206 | 102.615 | 97.452 | 12.181 | 51.5 | 106.0 |
|
120 |
+
| 132500 | 0.5354 | 63.5 | 232.0 | 0.4195 | 102.0382 | 98.002 | 12.25 | 49.0 | 115.0 |
|
121 |
+
| 135000 | 0.5455 | 65.0 | 239.0 | 0.4195 | 102.4661 | 97.593 | 12.199 | 50.75 | 83.5 |
|
122 |
+
| 137500 | 0.5556 | 69.0 | 232.0 | 0.4227 | 102.0828 | 97.96 | 12.245 | 52.25 | 133.0 |
|
123 |
+
| 140000 | 0.5657 | 66.0 | 206.0 | 0.4239 | 102.0497 | 97.991 | 12.249 | 55.0 | 148.0 |
|
124 |
+
| 142500 | 0.5758 | 65.5 | 218.0 | 0.4256 | 102.0522 | 97.989 | 12.249 | 50.25 | 144.0 |
|
125 |
+
| 145000 | 0.5859 | 65.0 | 227.0 | 0.4201 | 102.154 | 97.891 | 12.236 | 50.5 | 135.0 |
|
126 |
+
| 147500 | 0.5960 | 65.5 | 211.0 | 0.4216 | 102.1033 | 97.94 | 12.243 | 49.75 | 92.5 |
|
127 |
+
| 150000 | 0.6061 | 66.0 | 242.0 | 0.4288 | 102.1595 | 97.886 | 12.236 | 52.0 | 137.0 |
|
128 |
+
| 152500 | 0.6162 | 67.0 | 229.0 | 0.4180 | 102.5134 | 97.548 | 12.194 | 49.25 | 111.0 |
|
129 |
+
| 155000 | 0.6263 | 65.0 | 206.0 | 0.4224 | 102.3146 | 97.738 | 12.217 | 51.0 | 151.0 |
|
130 |
+
| 157500 | 0.6364 | 66.0 | 220.0 | 0.4266 | 102.1949 | 97.852 | 12.232 | 48.75 | 107.5 |
|
131 |
+
| 160000 | 0.6465 | 67.5 | 212.0 | 0.4226 | 102.1337 | 97.911 | 12.239 | 49.25 | 97.0 |
|
132 |
+
| 162500 | 0.6566 | 67.0 | 212.0 | 0.4186 | 102.1028 | 97.94 | 12.243 | 50.0 | 89.0 |
|
133 |
+
| 165000 | 0.6667 | 63.75 | 231.0 | 0.4159 | 101.9547 | 98.083 | 12.26 | 47.75 | 116.0 |
|
134 |
+
| 167500 | 0.6768 | 67.5 | 227.0 | 0.4208 | 102.0173 | 98.023 | 12.253 | 51.0 | 203.0 |
|
135 |
+
| 170000 | 0.6869 | 65.5 | 268.0 | 0.4194 | 101.9863 | 98.052 | 12.257 | 49.25 | 108.5 |
|
136 |
+
| 172500 | 0.6970 | 66.5 | 208.0 | 0.4165 | 102.0041 | 98.035 | 12.254 | 49.75 | 175.0 |
|
137 |
+
| 175000 | 0.7071 | 68.0 | 221.0 | 0.4189 | 102.0695 | 97.972 | 12.247 | 50.25 | 91.5 |
|
138 |
+
| 177500 | 0.7172 | 66.0 | 211.0 | 0.4188 | 101.9141 | 98.122 | 12.265 | 48.75 | 124.0 |
|
139 |
+
| 180000 | 0.7273 | 64.0 | 200.0 | 0.4169 | 102.2518 | 97.798 | 12.225 | 46.75 | 113.0 |
|
140 |
+
| 182500 | 0.7374 | 64.0 | 204.0 | 0.4204 | 102.0976 | 97.946 | 12.243 | 49.5 | 140.0 |
|
141 |
+
| 185000 | 0.7475 | 65.0 | 213.0 | 0.4152 | 102.3207 | 97.732 | 12.216 | 48.5 | 127.0 |
|
142 |
+
| 187500 | 0.7576 | 65.0 | 206.0 | 0.4155 | 102.1198 | 97.924 | 12.241 | 49.25 | 108.0 |
|
143 |
+
| 190000 | 0.7677 | 66.0 | 213.0 | 0.4182 | 102.192 | 97.855 | 12.232 | 49.25 | 130.0 |
|
144 |
+
| 192500 | 0.7778 | 68.0 | 221.0 | 0.4160 | 102.0413 | 98.0 | 12.25 | 53.75 | 143.0 |
|
145 |
+
| 195000 | 0.7879 | 66.5 | 225.0 | 0.4136 | 102.0553 | 97.986 | 12.248 | 53.0 | 164.0 |
|
146 |
+
| 197500 | 0.7980 | 65.5 | 218.0 | 0.4160 | 101.9027 | 98.133 | 12.267 | 49.0 | 89.0 |
|
147 |
+
| 200000 | 0.8081 | 66.0 | 225.0 | 0.4148 | 102.4437 | 97.615 | 12.202 | 48.0 | 105.5 |
|
148 |
+
| 202500 | 0.8182 | 66.5 | 208.0 | 0.4189 | 102.0449 | 97.996 | 12.25 | 49.0 | 131.0 |
|
149 |
+
| 205000 | 0.8283 | 66.0 | 217.0 | 0.4150 | 102.0719 | 97.97 | 12.246 | 51.5 | 186.0 |
|
150 |
+
| 207500 | 0.8384 | 69.5 | 254.0 | 0.4214 | 102.0931 | 97.95 | 12.244 | 51.25 | 153.0 |
|
151 |
+
| 210000 | 0.8485 | 67.0 | 216.0 | 0.4235 | 102.1471 | 97.898 | 12.237 | 49.75 | 121.5 |
|
152 |
+
| 212500 | 0.8586 | 65.5 | 216.0 | 0.4125 | 102.1121 | 97.932 | 12.241 | 48.5 | 120.0 |
|
153 |
+
| 215000 | 0.8687 | 65.5 | 225.0 | 0.4145 | 102.235 | 97.814 | 12.227 | 49.0 | 105.5 |
|
154 |
+
| 217500 | 0.8788 | 69.0 | 264.0 | 0.4188 | 102.0807 | 97.962 | 12.245 | 49.5 | 209.0 |
|
155 |
+
| 220000 | 0.8889 | 68.0 | 218.0 | 0.4157 | 102.1941 | 97.853 | 12.232 | 50.5 | 107.0 |
|
156 |
+
| 222500 | 0.8990 | 65.5 | 212.0 | 0.4231 | 102.2725 | 97.778 | 12.222 | 51.5 | 201.0 |
|
157 |
+
| 225000 | 0.9091 | 65.0 | 229.0 | 0.4150 | 102.3034 | 97.748 | 12.219 | 50.25 | 137.0 |
|
158 |
+
| 227500 | 0.9192 | 65.0 | 218.0 | 0.4131 | 102.2051 | 97.842 | 12.23 | 48.75 | 131.0 |
|
159 |
+
| 230000 | 0.9293 | 64.5 | 217.0 | 0.4264 | 102.0499 | 97.991 | 12.249 | 48.5 | 148.0 |
|
160 |
+
| 232500 | 0.9394 | 68.0 | 232.0 | 0.4168 | 102.1883 | 97.859 | 12.232 | 49.0 | 107.0 |
|
161 |
+
| 235000 | 0.9495 | 69.0 | 233.0 | 0.4199 | 102.0331 | 98.007 | 12.251 | 46.75 | 157.0 |
|
162 |
+
| 237500 | 0.9596 | 65.0 | 212.0 | 0.4194 | 102.1218 | 97.922 | 12.24 | 52.25 | 150.0 |
|
163 |
+
| 240000 | 0.9697 | 65.5 | 215.0 | 0.4144 | 102.0228 | 98.017 | 12.252 | 48.75 | 182.0 |
|
164 |
+
| 242500 | 0.9798 | 65.0 | 209.0 | 0.4203 | 102.0484 | 97.993 | 12.249 | 49.5 | 95.5 |
|
165 |
+
| 245000 | 0.9899 | 65.0 | 266.0 | 0.4097 | 102.0961 | 97.947 | 12.243 | 49.5 | 111.0 |
|
166 |
+
| 247500 | 1.0 | 64.5 | 222.0 | 0.4159 | 102.373 | 97.682 | 12.21 | 49.75 | 81.5 |
|
167 |
|
168 |
### Framework versions
|
169 |
- Distily 0.2.0
|
logs/dataset_sample_size=1000000/events.out.tfevents.1724188011.f383272e719b
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2633d80715b7195325acb3e4c65fed74cc2b66845609edaade7eed588ca957e4
|
3 |
+
size 36935941
|
logs/dataset_sample_size=1000000/events.out.tfevents.1724208622.f383272e719b
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0569921f8918952c376586118c2aa8dd4b19a51353cc99d88485b407853fb56f
|
3 |
+
size 118959234
|
logs/dataset_sample_size=1000000/events.out.tfevents.1724245927.f383272e719b
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e2de43004f97294c8bfd6fa5696ac01ca02dfd6139b148ac8134e4087b052c61
|
3 |
+
size 588
|
logs/lr_scheduler_type=inverse_sqrt, warmup_ratio=0.5/completed.flag
ADDED
File without changes
|
logs/lr_scheduler_type=inverse_sqrt, warmup_ratio=0.5/events.out.tfevents.1724187476.f383272e719b
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:06d95d6065ff2134dff53afa6caf2568c8ef678d2086b62b9a129427f6d4af69
|
3 |
+
size 588
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 163832792
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b5d5bbcb6f904ac723ec34bc104efb7630c4b3ecb460d79543147ac527e394a5
|
3 |
size 163832792
|
training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1eba2c4d23bb73a09ccbdd2dd436f2cc031d3b2446586620979f29623af8edc2
|
3 |
+
size 1017899016
|