lapp0's picture
End of training
5e729fe verified
|
raw
history blame
4.42 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.15_gpt2
    results: []

distily_bench_obj_cross_v2.15_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 84.5
  • eval_frwikippl: 356.0
  • eval_zhwikippl: 135.0
  • eval_tinystoriesppl: 72.0
  • eval_loss: 0.6795
  • eval_runtime: 16.7299
  • eval_samples_per_second: 59.773
  • eval_steps_per_second: 7.472

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 7.4226 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 3126736191488.0 129742372077568.0 20.7540 16.7407 59.735 7.467 6677331968.0 80264348827648.0
1000 0.0404 320.0 1504.0 1.5025 16.7846 59.579 7.447 245.0 280.0
2000 0.0808 220.0 800.0 1.3040 16.7756 59.61 7.451 189.0 201.0
3000 0.1212 180.0 648.0 1.1450 16.7863 59.572 7.447 153.0 149.0
4000 0.1616 148.0 552.0 1.0301 16.7242 59.794 7.474 121.5 153.0
5000 0.2020 129.0 452.0 0.9348 16.7817 59.589 7.449 105.0 176.0
6000 0.2424 115.5 442.0 0.8587 16.8358 59.397 7.425 86.0 139.0
7000 0.2828 103.0 432.0 0.8002 16.7689 59.634 7.454 78.5 139.0
8000 0.3232 96.5 418.0 0.7424 16.7778 59.602 7.45 73.5 126.0
9000 0.3636 84.5 356.0 0.6795 16.7299 59.773 7.472 72.0 135.0
10000 0.4040 81.5 304.0 0.6324 16.7186 59.813 7.477 66.0 125.5
11000 0.4444 77.5 282.0 0.5972 16.777 59.605 7.451 59.25 121.5
12000 0.4848 72.5 288.0 0.5723 16.7347 59.756 7.47 56.75 118.0
13000 0.5253 69.5 256.0 0.5577 16.7525 59.693 7.462 55.5 141.0
14000 0.5657 68.5 237.0 0.5389 16.7317 59.767 7.471 54.75 286.0
15000 0.6061 67.5 252.0 0.5187 16.7326 59.764 7.47 52.25 98.5
16000 0.6465 69.0 235.0 0.5174 16.8095 59.49 7.436 54.75 125.5
17000 0.6869 67.0 231.0 0.5048 16.7326 59.764 7.47 50.5 116.0
18000 0.7273 66.0 225.0 0.4909 16.7575 59.675 7.459 49.75 132.0
19000 0.7677 66.5 247.0 0.4894 16.8313 59.413 7.427 49.75 112.0
20000 0.8081 66.5 233.0 0.4870 16.7365 59.75 7.469 51.5 103.5
21000 0.8485 65.0 221.0 0.4831 16.703 59.869 7.484 50.75 181.0
22000 0.8889 65.5 199.0 0.4740 16.7629 59.656 7.457 49.5 95.5
23000 0.9293 67.0 223.0 0.4752 16.7201 59.808 7.476 46.5 174.0
24000 0.9697 65.0 207.0 0.4700 16.8026 59.515 7.439 46.75 98.5
24750 1.0 67.0 207.0 0.4672 16.7876 59.568 7.446 47.0 185.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0