distily_bitnet_gpt2 / README.md
lapp0's picture
End of training
d1b306f verified
|
raw
history blame
7.6 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - bitnet
  - 1.58b
  - generated_from_trainer
model-index:
  - name: distily_bitnet_gpt2
    results: []

distily_bitnet_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 87.5
  • eval_frwikippl: 358.0
  • eval_zhwikippl: 139.0
  • eval_tinystoriesppl: 72.5
  • eval_loss: 0.6931
  • eval_runtime: 29.8206
  • eval_samples_per_second: 83.835
  • eval_steps_per_second: 10.496

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 7.5008 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.25 61.25 11.6875 19.125
0 0 820338753536.0 43705587204096.0 18.6434 30.0294 83.252 10.423 4731174912.0 17729624997888.0
1000 0.0162 324.0 1576.0 1.4569 29.8145 83.852 10.498 258.0 386.0
2000 0.0323 220.0 844.0 1.2562 29.8871 83.648 10.473 184.0 203.0
3000 0.0485 182.0 628.0 1.1014 29.8663 83.706 10.48 141.0 178.0
4000 0.0646 148.0 520.0 0.9878 29.8318 83.803 10.492 121.0 162.0
5000 0.0808 130.0 456.0 0.9061 29.8914 83.636 10.471 103.5 150.0
6000 0.0970 117.0 426.0 0.8448 29.8301 83.808 10.493 95.5 165.0
7000 0.1131 105.0 460.0 0.7878 29.8233 83.827 10.495 86.0 150.0
8000 0.1293 98.5 396.0 0.7433 29.8713 83.692 10.478 78.0 143.0
9000 0.1455 87.5 358.0 0.6931 29.8206 83.835 10.496 72.5 139.0
10000 0.1616 82.0 340.0 0.6355 29.8348 83.795 10.491 67.5 132.0
11000 0.1778 77.5 330.0 0.5981 29.8369 83.789 10.49 60.75 113.0
12000 0.1939 75.0 286.0 0.5715 29.8463 83.762 10.487 62.0 152.0
13000 0.2101 73.0 249.0 0.5484 29.8498 83.753 10.486 55.5 141.0
14000 0.2263 72.5 245.0 0.5344 29.8153 83.85 10.498 54.75 85.5
15000 0.2424 73.0 246.0 0.5171 29.8338 83.798 10.491 55.5 87.0
16000 0.2586 70.5 237.0 0.5125 29.8543 83.74 10.484 52.5 92.0
17000 0.2747 70.0 219.0 0.4954 29.8236 83.826 10.495 56.25 160.0
18000 0.2909 67.5 250.0 0.5031 29.8194 83.838 10.497 52.5 173.0
19000 0.3071 72.0 223.0 0.4795 29.8542 83.74 10.484 51.5 151.0
20000 0.3232 68.0 218.0 0.4735 29.8718 83.691 10.478 52.0 151.0
21000 0.3394 67.5 221.0 0.4795 29.8655 83.709 10.48 52.5 190.0
22000 0.3556 68.5 223.0 0.4733 29.8778 83.674 10.476 52.0 96.0
23000 0.3717 69.0 204.0 0.4633 29.8215 83.832 10.496 48.75 104.0
24000 0.3879 66.0 222.0 0.4587 29.843 83.772 10.488 50.0 122.0
25000 0.4040 67.0 216.0 0.4568 29.8561 83.735 10.484 48.75 92.0
26000 0.4202 70.0 214.0 0.4556 29.8665 83.706 10.48 49.0 103.5
27000 0.4364 66.0 220.0 0.4601 29.8646 83.711 10.481 48.5 95.5
28000 0.4525 65.0 205.0 0.4516 29.8541 83.741 10.484 46.5 150.0
29000 0.4687 66.5 223.0 0.4496 29.8307 83.806 10.493 46.5 102.5
30000 0.4848 66.5 237.0 0.4509 29.8678 83.702 10.48 46.25 137.0
31000 0.5010 64.5 219.0 0.4445 29.851 83.749 10.485 46.0 97.5
32000 0.5172 64.0 200.0 0.4380 29.8955 83.625 10.47 49.25 101.0
33000 0.5333 64.5 204.0 0.4379 29.838 83.786 10.49 49.0 85.5
34000 0.5495 64.0 217.0 0.4419 29.8427 83.773 10.488 46.25 76.0
35000 0.5657 72.5 229.0 0.4345 29.8803 83.667 10.475 50.0 128.0
36000 0.5818 67.5 203.0 0.4349 30.0752 83.125 10.407 45.0 147.0
37000 0.5980 65.5 205.0 0.4354 29.8558 83.736 10.484 47.75 129.0
38000 0.6141 63.75 208.0 0.4375 29.868 83.702 10.479 46.0 108.5
39000 0.6303 64.0 215.0 0.4395 30.2231 82.718 10.356 45.5 125.0
40000 0.6465 64.5 197.0 0.4278 29.9055 83.597 10.466 46.0 84.5
41000 0.6626 62.25 186.0 0.4285 29.951 83.47 10.45 44.75 80.0
42000 0.6788 62.75 225.0 0.4301 29.835 83.794 10.491 46.25 168.0
43000 0.6949 65.5 224.0 0.4222 29.874 83.685 10.477 46.5 139.0
44000 0.7111 63.5 197.0 0.4294 29.9084 83.589 10.465 45.75 125.5
45000 0.7273 63.0 192.0 0.4263 29.8797 83.669 10.475 46.25 95.0
46000 0.7434 63.25 198.0 0.4266 29.8479 83.758 10.487 44.75 120.5
47000 0.7596 64.5 213.0 0.4247 29.8769 83.677 10.476 44.5 120.5
48000 0.7758 62.25 202.0 0.4214 29.8514 83.748 10.485 42.75 83.5
49000 0.7919 63.75 204.0 0.4230 29.8895 83.641 10.472 46.25 94.5
50000 0.8081 63.75 209.0 0.4218 29.9008 83.61 10.468 45.25 131.0
51000 0.8242 65.5 223.0 0.4213 29.8534 83.743 10.485 45.0 233.0
52000 0.8404 64.5 195.0 0.4132 29.8416 83.776 10.489 44.0 99.0
53000 0.8566 64.0 216.0 0.4259 29.8576 83.731 10.483 45.5 95.0
54000 0.8727 65.0 207.0 0.4207 29.8695 83.698 10.479 45.5 126.0
55000 0.8889 66.5 198.0 0.4141 29.8307 83.806 10.493 42.75 118.0
56000 0.9051 60.0 186.0 0.4209 29.866 83.707 10.48 43.75 142.0
57000 0.9212 62.25 192.0 0.4143 29.9063 83.594 10.466 45.0 78.0
58000 0.9374 63.5 205.0 0.4192 29.859 83.727 10.483 44.75 117.5
59000 0.9535 62.75 191.0 0.4202 29.8691 83.699 10.479 44.0 100.0
60000 0.9697 66.0 219.0 0.4149 29.9387 83.504 10.455 43.75 130.0
61000 0.9859 64.5 207.0 0.4162 29.8366 83.79 10.49 43.5 161.0
61875 1.0 61.5 204.0 0.4125 29.9423 83.494 10.453 44.25 223.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0