lapp0's picture
End of training
542a4f1 verified
|
raw
history blame
3.31 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_attn
    results: []

distily_bench_gpt2_attn

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 212.2672
  • eval_frwikippl: 1352.8285
  • eval_zhwikippl: 811.8465
  • eval_loss: 1.2429
  • eval_runtime: 17.2351
  • eval_samples_per_second: 58.021
  • eval_steps_per_second: 7.253

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=2.0, loss_fn=mse, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.2202 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2086 57.2728 18.1784
0 0 58954.875 57690.6602 5.9433 17.2113 58.101 7.263 54707.1133
1000 0.0808 710.2846 4396.9824 1.9296 17.117 58.421 7.303 18078.0879
2000 0.1616 503.9694 3086.3142 1.7454 17.2903 57.836 7.229 2703.2686
3000 0.2424 420.6563 2955.9736 1.6378 17.2224 58.064 7.258 1477.8748
4000 0.3232 367.0064 2704.3167 1.5544 17.2208 58.069 7.259 851.8279
5000 0.4040 317.6482 2113.5315 1.4722 17.1418 58.337 7.292 1214.1425
6000 0.4848 276.7272 1629.8280 1.3995 17.1258 58.392 7.299 813.5826
7000 0.5657 250.8947 1553.0933 1.3412 17.2475 57.979 7.247 773.1216
8000 0.6465 228.6603 1347.1174 1.2915 17.2115 58.101 7.263 716.5538
9000 0.7273 212.2672 1352.8285 1.2429 17.2351 58.021 7.253 811.8465
10000 0.8081 193.2158 1189.5732 1.1981 17.1888 58.177 7.272 670.6308
11000 0.8889 178.6132 1058.1842 1.1502 17.2336 58.026 7.253 653.9169
12000 0.9697 165.5636 977.5611 1.1143 17.2114 58.101 7.263 509.6881
12375 1.0 160.1887 948.9035 1.0983 17.1765 58.219 7.277 518.8907

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0