lapp0's picture
End of training
e21936e verified
|
raw
history blame
No virus
4.42 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.15_gpt2
    results: []

distily_bench_obj_cross_v2.15_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 84.0
  • eval_frwikippl: 342.0
  • eval_zhwikippl: 217.0
  • eval_tinystoriesppl: 69.5
  • eval_loss: 0.6877
  • eval_runtime: 16.9969
  • eval_samples_per_second: 58.834
  • eval_steps_per_second: 7.354

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=1.0, loss_fn=mse, layer_mapper=last, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 7.7252 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 2473901162496.0 170424302305280.0 20.7680 17.0409 58.682 7.335 4060086272.0 71468255805440.0
1000 0.0404 334.0 1464.0 1.5419 17.0178 58.762 7.345 243.0 596.0
2000 0.0808 232.0 756.0 1.3235 16.9755 58.909 7.364 189.0 250.0
3000 0.1212 180.0 628.0 1.1620 16.9923 58.85 7.356 149.0 171.0
4000 0.1616 150.0 576.0 1.0434 16.9803 58.892 7.361 121.5 172.0
5000 0.2020 130.0 504.0 0.9520 17.0128 58.779 7.347 100.5 144.0
6000 0.2424 113.5 420.0 0.8702 17.0074 58.798 7.35 91.0 137.0
7000 0.2828 106.0 408.0 0.8100 16.9821 58.885 7.361 80.5 160.0
8000 0.3232 96.5 396.0 0.7421 16.9749 58.911 7.364 70.5 127.0
9000 0.3636 84.0 342.0 0.6877 16.9969 58.834 7.354 69.5 217.0
10000 0.4040 78.0 300.0 0.6467 16.9846 58.877 7.36 65.0 139.0
11000 0.4444 77.0 278.0 0.5957 16.9903 58.857 7.357 60.0 127.5
12000 0.4848 75.0 272.0 0.5789 16.9858 58.873 7.359 56.5 140.0
13000 0.5253 71.5 266.0 0.5525 16.9418 59.026 7.378 56.5 116.0
14000 0.5657 71.0 252.0 0.5416 17.088 58.521 7.315 53.75 132.0
15000 0.6061 68.0 221.0 0.5283 16.9524 58.989 7.374 50.25 112.5
16000 0.6465 70.0 244.0 0.5200 17.0495 58.653 7.332 52.5 109.5
17000 0.6869 67.0 225.0 0.5097 17.0223 58.747 7.343 51.5 109.0
18000 0.7273 71.0 239.0 0.5016 17.0519 58.644 7.331 49.5 150.0
19000 0.7677 68.0 212.0 0.4887 17.0831 58.537 7.317 51.25 98.0
20000 0.8081 65.0 211.0 0.4865 17.0098 58.789 7.349 49.0 101.5
21000 0.8485 64.5 217.0 0.4791 17.0253 58.736 7.342 47.5 142.0
22000 0.8889 66.5 230.0 0.4798 16.9954 58.839 7.355 48.5 147.0
23000 0.9293 62.5 212.0 0.4675 16.9835 58.881 7.36 45.5 134.0
24000 0.9697 63.5 220.0 0.4712 16.9973 58.833 7.354 47.0 138.0
24750 1.0 63.75 247.0 0.4679 17.0597 58.618 7.327 45.75 205.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0