End of training

Browse files

Files changed (6) hide show

README.md +17 -52
logs/learning_rate=0.0001, per_device_train_batch_size=8, warmup_ratio=0.5/events.out.tfevents.1724116219.5f530b1cf724 +3 -0
logs/learning_rate=0.0001, per_device_train_batch_size=8, warmup_ratio=0.5/events.out.tfevents.1724118620.5f530b1cf724 +3 -0
logs/learning_rate=4e-05, per_device_train_batch_size=1, warmup_ratio=0.5/completed.flag +0 -0
model.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -16,14 +16,14 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
-- eval_enwikippl: 344.0
-- eval_frwikippl: 1568.0
-- eval_zhwikippl: 390.0
-- eval_tinystoriesppl: 238.0
-- eval_loss: 1.5858
-- eval_runtime: 12.5697
-- eval_samples_per_second: 47.734
-- eval_steps_per_second: 11.933
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
@@ -48,8 +48,8 @@ More information needed
 The following hyperparameters were used during training:
 - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
 - train_embeddings: True
-- learning_rate: 4e-05
-- train_batch_size: 1
 - eval_batch_size: 4
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
@@ -58,53 +58,18 @@ The following hyperparameters were used during training:
 - num_epochs: 1.0
 ### Resource Usage
-Peak GPU Memory: 4.1856 GB
 ### Eval-Phase Metrics
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 43.75 | 61.75 |  |  |  |  | 11.8125 | 19.125 |
-| 0 | 0 | 1821066133504.0 | 158329674399744.0 | 19.3254 | 12.6422 | 47.46 | 11.865 | 12079595520.0 | 98956046499840.0 |
-| 1500 | 0.0253 | 22912.0 | 397312.0 | 4.5582 | 12.7508 | 47.056 | 11.764 | 8960.0 | 770048.0 |
-| 3000 | 0.0505 | 2672.0 | 20480.0 | 3.0293 | 12.6958 | 47.26 | 11.815 | 1888.0 | 638976.0 |
-| 4500 | 0.0758 | 876.0 | 5440.0 | 2.2358 | 12.6348 | 47.488 | 11.872 | 600.0 | 20864.0 |
-| 6000 | 0.1010 | 510.0 | 2784.0 | 1.8930 | 12.5966 | 47.632 | 11.908 | 368.0 | 632.0 |
-| 7500 | 0.1263 | 378.0 | 1808.0 | 1.6710 | 12.6135 | 47.568 | 11.892 | 278.0 | 384.0 |
-| 9000 | 0.1515 | 344.0 | 1568.0 | 1.5858 | 12.5697 | 47.734 | 11.933 | 238.0 | 390.0 |
-| 10500 | 0.1768 | 316.0 | 1448.0 | 1.5377 | 12.5933 | 47.644 | 11.911 | 262.0 | 464.0 |
-| 12000 | 0.2020 | 314.0 | 1320.0 | 1.5134 | 12.5641 | 47.755 | 11.939 | 244.0 | 548.0 |
-| 13500 | 0.2273 | 272.0 | 1248.0 | 1.4329 | 12.5786 | 47.7 | 11.925 | 211.0 | 243.0 |
-| 15000 | 0.2525 | 244.0 | 1168.0 | 1.3673 | 12.5803 | 47.694 | 11.923 | 202.0 | 180.0 |
-| 16500 | 0.2778 | 219.0 | 976.0 | 1.3144 | 12.5889 | 47.661 | 11.915 | 194.0 | 167.0 |
-| 18000 | 0.3030 | 218.0 | 1016.0 | 1.3081 | 12.6999 | 47.245 | 11.811 | 185.0 | 294.0 |
-| 19500 | 0.3283 | 208.0 | 796.0 | 1.2611 | 20.0133 | 29.98 | 7.495 | 179.0 | 191.0 |
-| 21000 | 0.3535 | 211.0 | 908.0 | 1.2402 | 12.674 | 47.341 | 11.835 | 171.0 | 182.0 |
-| 22500 | 0.3788 | 192.0 | 720.0 | 1.2178 | 12.581 | 47.691 | 11.923 | 158.0 | 195.0 |
-| 24000 | 0.4040 | 189.0 | 764.0 | 1.1770 | 12.6729 | 47.345 | 11.836 | 148.0 | 215.0 |
-| 25500 | 0.4293 | 171.0 | 740.0 | 1.1165 | 12.6124 | 47.572 | 11.893 | 139.0 | 237.0 |
-| 27000 | 0.4545 | 162.0 | 640.0 | 1.0755 | 12.5788 | 47.699 | 11.925 | 137.0 | 204.0 |
-| 28500 | 0.4798 | 154.0 | 604.0 | 1.0288 | 12.6014 | 47.614 | 11.903 | 125.5 | 143.0 |
-| 30000 | 0.5051 | 145.0 | 632.0 | 1.0105 | 12.6861 | 47.296 | 11.824 | 111.0 | 180.0 |
-| 31500 | 0.5303 | 135.0 | 624.0 | 0.9842 | 12.6101 | 47.581 | 11.895 | 110.0 | 161.0 |
-| 33000 | 0.5556 | 135.0 | 532.0 | 0.9620 | 12.6688 | 47.36 | 11.84 | 102.0 | 148.0 |
-| 34500 | 0.5808 | 127.5 | 564.0 | 0.9313 | 12.6938 | 47.267 | 11.817 | 108.5 | 200.0 |
-| 36000 | 0.6061 | 129.0 | 506.0 | 0.9064 | 12.5848 | 47.677 | 11.919 | 98.0 | 215.0 |
-| 37500 | 0.6313 | 115.5 | 464.0 | 0.8361 | 12.5738 | 47.718 | 11.93 | 91.0 | 164.0 |
-| 39000 | 0.6566 | 107.0 | 410.0 | 0.7831 | 12.5036 | 47.986 | 11.997 | 86.0 | 169.0 |
-| 40500 | 0.6818 | 102.5 | 402.0 | 0.7615 | 12.5289 | 47.889 | 11.972 | 82.0 | 127.0 |
-| 42000 | 0.7071 | 100.5 | 394.0 | 0.7506 | 12.5392 | 47.85 | 11.963 | 80.5 | 128.0 |
-| 43500 | 0.7323 | 100.5 | 386.0 | 0.7381 | 12.6177 | 47.552 | 11.888 | 78.5 | 129.0 |
-| 45000 | 0.7576 | 100.5 | 376.0 | 0.7307 | 12.6297 | 47.507 | 11.877 | 80.0 | 141.0 |
-| 46500 | 0.7828 | 100.0 | 364.0 | 0.7279 | 12.6428 | 47.458 | 11.864 | 78.5 | 122.0 |
-| 48000 | 0.8081 | 100.0 | 384.0 | 0.7256 | 12.5268 | 47.897 | 11.974 | 78.5 | 130.0 |
-| 49500 | 0.8333 | 96.5 | 366.0 | 0.7072 | 12.6353 | 47.486 | 11.871 | 76.0 | 129.0 |
-| 51000 | 0.8586 | 94.0 | 358.0 | 0.6995 | 12.5977 | 47.628 | 11.907 | 75.0 | 125.5 |
-| 52500 | 0.8838 | 94.5 | 354.0 | 0.6960 | 12.6273 | 47.516 | 11.879 | 75.5 | 119.0 |
-| 54000 | 0.9091 | 94.0 | 354.0 | 0.6935 | 12.6419 | 47.461 | 11.865 | 75.5 | 118.0 |
-| 55500 | 0.9343 | 94.5 | 354.0 | 0.6914 | 12.5538 | 47.794 | 11.949 | 75.5 | 117.5 |
-| 57000 | 0.9596 | 94.0 | 352.0 | 0.6900 | 12.6293 | 47.509 | 11.877 | 75.0 | 117.5 |
-| 58500 | 0.9848 | 94.0 | 352.0 | 0.6898 | 12.622 | 47.536 | 11.884 | 75.0 | 117.5 |
-| 59400 | 1.0 | 94.0 | 352.0 | 0.6898 | 12.6709 | 47.353 | 11.838 | 75.0 | 117.5 |
 ### Framework versions
 - Distily 0.2.0

 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
+- eval_enwikippl: 111.0
+- eval_frwikippl: 400.0
+- eval_zhwikippl: 122.5
+- eval_tinystoriesppl: 91.0
+- eval_loss: 0.8789
+- eval_runtime: 12.6655
+- eval_samples_per_second: 47.373
+- eval_steps_per_second: 11.843
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
 The following hyperparameters were used during training:
 - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
 - train_embeddings: True
+- learning_rate: 0.0001
+- train_batch_size: 8
 - eval_batch_size: 4
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - num_epochs: 1.0
 ### Resource Usage
+Peak GPU Memory: 7.9381 GB
 ### Eval-Phase Metrics
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 43.75 | 61.75 |  |  |  |  | 11.8125 | 19.125 |
+| 0 | 0 | 828928688128.0 | 52226802319360.0 | 21.0583 | 12.4569 | 48.166 | 12.042 | 5167382528.0 | 20753281974272.0 |
+| 1500 | 0.2020 | 512.0 | 3472.0 | 1.8762 | 12.4942 | 48.022 | 12.006 | 344.0 | 868.0 |
+| 3000 | 0.4040 | 237.0 | 944.0 | 1.4192 | 12.543 | 47.835 | 11.959 | 207.0 | 223.0 |
+| 4500 | 0.6061 | 148.0 | 532.0 | 1.1068 | 12.5192 | 47.926 | 11.982 | 135.0 | 158.0 |
+| 6000 | 0.8081 | 118.0 | 430.0 | 0.9155 | 12.5398 | 47.848 | 11.962 | 98.0 | 122.0 |
+| 7425 | 1.0 | 111.0 | 400.0 | 0.8789 | 12.6655 | 47.373 | 11.843 | 91.0 | 122.5 |
 ### Framework versions
 - Distily 0.2.0

logs/learning_rate=0.0001, per_device_train_batch_size=8, warmup_ratio=0.5/events.out.tfevents.1724116219.5f530b1cf724 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:84991b98cdd115b5865b3f3d72395e916728091ed9bcbb99d6552b6e35a2dd33
+size 3512274

logs/learning_rate=0.0001, per_device_train_batch_size=8, warmup_ratio=0.5/events.out.tfevents.1724118620.5f530b1cf724 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a969bb9ff141beb753d2e02ab5b00470ce28bb2ec9b8ef575e5447e44de4ff4e
+size 578

logs/learning_rate=4e-05, per_device_train_batch_size=1, warmup_ratio=0.5/completed.flag ADDED Viewed

File without changes

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cfc4284c27c8647f7a2bf8be7faf3e8520da816989c8575ca5192b9d52c36782
 size 248894656

 version https://git-lfs.github.com/spec/v1
+oid sha256:44c225267def37ca71584d3beff29b20501933b79fbecef253613f1b35f4a73d
 size 248894656

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8b6bb731483e037522cbf73fe036a55dd79ce65d338098411af92298c352a069
 size 1017899144

 version https://git-lfs.github.com/spec/v1
+oid sha256:77bf0e293d306f9ced0e580e530e228cdb6b58e30b5f6999d1d162bfa633f029
 size 1017899144