lapp0's picture
End of training
f71c803 verified
|
raw
history blame
No virus
4.44 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.15_gpt2
    results: []

distily_bench_obj_cross_v2.15_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 560.0
  • eval_frwikippl: 644.0
  • eval_zhwikippl: 488.0
  • eval_tinystoriesppl: 284.0
  • eval_loss: 0.6086
  • eval_runtime: 16.7587
  • eval_samples_per_second: 59.67
  • eval_steps_per_second: 7.459

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 7.4226 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 1408749273088.0 96207267430400.0 20.4380 16.6447 60.079 7.51 7482638336.0 43430709297152.0
1000 0.0404 1408.0 1432.0 0.8546 16.7128 59.835 7.479 788.0 1056.0
2000 0.0808 988.0 928.0 0.7631 16.6827 59.942 7.493 520.0 302.0
3000 0.1212 836.0 760.0 0.7155 16.633 60.121 7.515 402.0 196.0
4000 0.1616 732.0 676.0 0.6800 16.6836 59.939 7.492 378.0 157.0
5000 0.2020 676.0 668.0 0.6574 16.6514 60.055 7.507 322.0 227.0
6000 0.2424 648.0 732.0 0.6383 16.6833 59.94 7.493 286.0 190.0
7000 0.2828 612.0 632.0 0.6373 16.8106 59.486 7.436 286.0 169.0
8000 0.3232 588.0 704.0 0.6243 16.6588 60.028 7.504 266.0 596.0
9000 0.3636 560.0 644.0 0.6086 16.7587 59.67 7.459 284.0 488.0
10000 0.4040 532.0 564.0 0.5994 16.6696 59.989 7.499 256.0 142.0
11000 0.4444 544.0 628.0 0.5916 16.7004 59.879 7.485 252.0 153.0
12000 0.4848 540.0 612.0 0.5828 16.7602 59.665 7.458 252.0 568.0
13000 0.5253 528.0 612.0 0.5735 16.6596 60.025 7.503 260.0 160.0
14000 0.5657 528.0 576.0 0.5628 16.7207 59.806 7.476 246.0 250.0
15000 0.6061 478.0 524.0 0.5511 16.736 59.752 7.469 232.0 170.0
16000 0.6465 442.0 552.0 0.5270 16.7225 59.8 7.475 228.0 214.0
17000 0.6869 420.0 524.0 0.4692 16.6506 60.058 7.507 212.0 174.0
18000 0.7273 384.0 478.0 0.4115 16.7225 59.8 7.475 208.0 144.0
19000 0.7677 362.0 400.0 0.3610 16.6691 59.991 7.499 195.0 128.0
20000 0.8081 344.0 346.0 0.3370 16.6695 59.99 7.499 184.0 107.5
21000 0.8485 306.0 302.0 0.3061 16.7054 59.861 7.483 161.0 110.5
22000 0.8889 300.0 318.0 0.2974 16.6709 59.985 7.498 160.0 84.0
23000 0.9293 290.0 298.0 0.2890 16.7049 59.863 7.483 162.0 103.0
24000 0.9697 300.0 290.0 0.2970 16.6771 59.963 7.495 164.0 85.5
24750 1.0 280.0 290.0 0.2782 16.74 59.737 7.467 162.0 91.5

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0