lapp0's picture
End of training
74537bd verified
|
raw
history blame
3.31 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_attn
    results: []

distily_bench_gpt2_attn

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 207.9922
  • eval_frwikippl: 1314.4666
  • eval_zhwikippl: 759.8159
  • eval_loss: 1.3326
  • eval_runtime: 17.3702
  • eval_samples_per_second: 57.57
  • eval_steps_per_second: 7.196

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=2.0, loss_fn=reverse_kl, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.2195 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2086 57.2728 18.1784
0 0 55429.6875 57698.8047 6.1518 17.3225 57.728 7.216 56988.9141
1000 0.0808 693.0135 4581.8110 2.0460 17.3292 57.706 7.213 22366.3984
2000 0.1616 504.2434 3241.0867 1.8627 17.42 57.405 7.176 1925.1605
3000 0.2424 416.2050 2635.7954 1.7568 17.2717 57.898 7.237 924.6143
4000 0.3232 367.7481 2426.7476 1.6637 17.2866 57.848 7.231 843.0013
5000 0.4040 314.2136 2124.5867 1.5737 17.3864 57.516 7.19 970.9272
6000 0.4848 274.5013 1727.5643 1.5012 17.3269 57.714 7.214 815.5406
7000 0.5657 250.4276 1508.2014 1.4380 17.3171 57.747 7.218 763.2737
8000 0.6465 227.7920 1387.4103 1.3836 17.3674 57.579 7.197 706.1053
9000 0.7273 207.9922 1314.4666 1.3326 17.3702 57.57 7.196 759.8159
10000 0.8081 190.8745 1171.5941 1.2857 17.3634 57.592 7.199 598.8307
11000 0.8889 175.8197 1119.1125 1.2359 17.3533 57.626 7.203 493.6122
12000 0.9697 159.0854 1000.5724 1.1915 17.3916 57.499 7.187 562.3265
12375 1.0 157.0114 957.9113 1.1794 17.3573 57.613 7.202 671.0787

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0