lapp0's picture
End of training
c194d10 verified
---
base_model: gpt2
library_name: distily
license: mit
tags:
- generated_from_trainer
model-index:
- name: distily_bench_gpt2_optim_extended2
results: []
---
# distily_bench_gpt2_optim_extended2
This student model is distilled from the teacher model [gpt2](https://huggingface.co/gpt2) using the dataset (unspecified).
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
It achieves the following results on the evaluation set:
- eval_enwikippl: 1466.9598
- eval_frwikippl: 6589.9976
- eval_zhwikippl: 19049.6328
- eval_loss: 8530.3359
- eval_runtime: 64.7254
- eval_samples_per_second: 46.35
- eval_steps_per_second: 11.587
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment.
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
-->
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- distillation_objective: 'legacy'
- loss_fn: kl
- train_embeddings: True
- learning_rate: 4e-05
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 1.0
### Resource Usage
Peak GPU Memory: 8.3354 GB
### Eval-Phase Metrics
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| **teacher eval** | | 30.2385 | 57.2728 | | | | | 18.1772 |
| 0 | 0 | 55332.9297 | 57511.9648 | 333834.9375 | 64.4894 | 46.519 | 11.63 | 57797.4375 |
| 500 | 0.0269 | 3397.8057 | 14195.7314 | 11200.1709 | 64.3161 | 46.645 | 11.661 | 46176.3906 |
| 1000 | 0.0539 | 2565.4185 | 11100.7803 | 10401.7070 | 64.9732 | 46.173 | 11.543 | 40786.25 |
| 1500 | 0.0808 | 2280.1555 | 9752.9180 | 10029.2695 | 65.1147 | 46.073 | 11.518 | 34300.0664 |
| 2000 | 0.1077 | 2111.7202 | 8617.1777 | 9861.6855 | 65.0861 | 46.093 | 11.523 | 27128.5918 |
| 2500 | 0.1347 | 1990.7386 | 8209.1553 | 9601.2373 | 64.8934 | 46.23 | 11.557 | 25209.2168 |
| 3000 | 0.1616 | 1918.3867 | 7799.5220 | 9467.9785 | 64.886 | 46.235 | 11.559 | 22736.8027 |
| 3500 | 0.1886 | 1818.1265 | 7551.1548 | 9349.7920 | 64.7154 | 46.357 | 11.589 | 22582.4883 |
| 4000 | 0.2155 | 1769.4467 | 7458.5562 | 9246.7197 | 64.7466 | 46.334 | 11.584 | 21114.0508 |
| 4500 | 0.2424 | 1728.6010 | 7363.9741 | 9099.1787 | 65.1202 | 46.069 | 11.517 | 20729.8926 |
| 5000 | 0.2694 | 1704.3433 | 7453.2944 | 9068.9062 | 64.69 | 46.375 | 11.594 | 21740.6367 |
| 5500 | 0.2963 | 1664.6129 | 7184.9824 | 8969.5039 | 64.2668 | 46.68 | 11.67 | 20534.2910 |
| 6000 | 0.3232 | 1631.8164 | 7198.6724 | 8898.6348 | 65.558 | 45.761 | 11.44 | 22204.2188 |
| 6500 | 0.3502 | 1589.2347 | 6884.9448 | 8812.0322 | 64.8035 | 46.294 | 11.573 | 19131.2129 |
| 7000 | 0.3771 | 1553.9370 | 6727.0781 | 8747.2002 | 65.3644 | 45.897 | 11.474 | 18709.2949 |
| 7500 | 0.4040 | 1540.8395 | 6779.4512 | 8707.7334 | 64.9958 | 46.157 | 11.539 | 18515.4297 |
| 8000 | 0.4310 | 1519.5702 | 6720.9155 | 8684.7471 | 65.1941 | 46.016 | 11.504 | 19323.7656 |
| 8500 | 0.4579 | 1499.4967 | 6702.9292 | 8618.3145 | 64.6164 | 46.428 | 11.607 | 20303.8691 |
| 9000 | 0.4848 | 1468.8694 | 6597.9023 | 8579.7764 | 65.1809 | 46.026 | 11.506 | 19187.4902 |
| 9500 | 0.5118 | 1466.9598 | 6589.9976 | 8530.3359 | 64.7254 | 46.35 | 11.587 | 19049.6328 |
| 10000 | 0.5387 | 1450.3381 | 6594.1782 | 8527.4131 | 65.1904 | 46.019 | 11.505 | 20619.4590 |
| 10500 | 0.5657 | 1422.2881 | 6539.0815 | 8491.7549 | 64.9945 | 46.158 | 11.539 | 20106.9180 |
| 11000 | 0.5926 | 1413.1234 | 6447.0659 | 8481.6855 | 65.107 | 46.078 | 11.52 | 18302.7910 |
| 11500 | 0.6195 | 1399.7990 | 6463.4536 | 8433.2803 | 64.732 | 46.345 | 11.586 | 18501.8398 |
| 12000 | 0.6465 | 1386.2769 | 6439.3423 | 8387.9043 | 64.7399 | 46.339 | 11.585 | 18306.4570 |
| 12500 | 0.6734 | 1381.0126 | 6380.1401 | 8346.6777 | 64.7944 | 46.3 | 11.575 | 19072.5371 |
| 13000 | 0.7003 | 1360.2582 | 6364.1938 | 8351.8828 | 64.608 | 46.434 | 11.608 | 18941.8262 |
| 13500 | 0.7273 | 1355.2496 | 6337.5508 | 8364.6289 | 64.4743 | 46.53 | 11.633 | 18354.1797 |
| 14000 | 0.7542 | 1342.7577 | 6132.9243 | 8351.3281 | 64.4281 | 46.564 | 11.641 | 18108.3027 |
| 14500 | 0.7811 | 1324.4287 | 6172.4019 | 8299.2109 | 64.0768 | 46.819 | 11.705 | 17864.5078 |
| 15000 | 0.8081 | 1311.8136 | 6250.3555 | 8288.9170 | 63.9884 | 46.883 | 11.721 | 18093.8008 |
| 15500 | 0.8350 | 1300.1758 | 6161.9678 | 8240.8105 | 65.0003 | 46.154 | 11.538 | 18435.2441 |
| 16000 | 0.8620 | 1294.5092 | 6087.9023 | 8225.1836 | 65.3075 | 45.937 | 11.484 | 18195.5664 |
| 16500 | 0.8889 | 1272.7550 | 6124.9282 | 8187.4561 | 64.7644 | 46.322 | 11.58 | 18905.1719 |
| 17000 | 0.9158 | 1271.9396 | 6117.1646 | 8179.8828 | 66.1093 | 45.379 | 11.345 | 17912.2910 |
| 17500 | 0.9428 | 1263.8173 | 5966.3726 | 8165.7280 | 64.1579 | 46.76 | 11.69 | 16779.9922 |
| 18000 | 0.9697 | 1245.9607 | 6065.6255 | 8219.2422 | 64.3092 | 46.65 | 11.662 | 17666.4180 |
| 18500 | 0.9966 | 1240.7706 | 6013.2476 | 8146.3145 | 64.5002 | 46.511 | 11.628 | 16597.2520 |
| 18562 | 1.0000 | 1242.8444 | 5899.8604 | 8136.0962 | 64.3726 | 46.604 | 11.651 | 16160.9238 |
### Framework versions
- Distily 0.2.0
- Transformers 4.44.0
- Pytorch 2.3.0
- Datasets 2.20.0