metadata

base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_attn
    results: []

distily_bench_gpt2_attn

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

Training procedure

The following hyperparameters were used during training:

distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=2.0, loss_fn=mse, layer_mapper=None, projector=None))
train_embeddings: True
learning_rate: 4e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant
num_epochs: 1.0

Peak GPU Memory: 8.2202 GB

step	epoch	enwikippl	frwikippl	loss	runtime	samples_per_second	steps_per_second	zhwikippl
teacher eval		30.2086	57.2728					18.1784
0	0	58954.875	57690.6602	5.9433	17.2113	58.101	7.263	54707.1133
1000	0.0808	710.2846	4396.9824	1.9296	17.117	58.421	7.303	18078.0879
2000	0.1616	503.9694	3086.3142	1.7454	17.2903	57.836	7.229	2703.2686
3000	0.2424	420.6563	2955.9736	1.6378	17.2224	58.064	7.258	1477.8748
4000	0.3232	367.0064	2704.3167	1.5544	17.2208	58.069	7.259	851.8279
5000	0.4040	317.6482	2113.5315	1.4722	17.1418	58.337	7.292	1214.1425
6000	0.4848	276.7272	1629.8280	1.3995	17.1258	58.392	7.299	813.5826
7000	0.5657	250.8947	1553.0933	1.3412	17.2475	57.979	7.247	773.1216
8000	0.6465	228.6603	1347.1174	1.2915	17.2115	58.101	7.263	716.5538
9000	0.7273	212.2672	1352.8285	1.2429	17.2351	58.021	7.253	811.8465
10000	0.8081	193.2158	1189.5732	1.1981	17.1888	58.177	7.272	670.6308
11000	0.8889	178.6132	1058.1842	1.1502	17.2336	58.026	7.253	653.9169
12000	0.9697	165.5636	977.5611	1.1143	17.2114	58.101	7.263	509.6881
12375	1.0	160.1887	948.9035	1.0983	17.1765	58.219	7.277	518.8907