--- base_model: gpt2 library_name: distily license: mit tags: - generated_from_trainer model-index: - name: distily_bench_gpt2_batch_size results: [] --- # distily_bench_gpt2_batch_size This student model is distilled from the teacher model [gpt2](https://huggingface.co./gpt2) using the dataset (unspecified). The [Distily](https://github.com/lapp0/distily) library was used for this distillation. It achieves the following results on the evaluation set: - eval_enwikippl: 579.5842 - eval_frwikippl: 3891.8010 - eval_zhwikippl: 6702.2964 - eval_loss: 7658.3999 - eval_runtime: 21.5573 - eval_samples_per_second: 46.388 - eval_steps_per_second: 11.597 ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - distillation_objective: - train_embeddings: True - learning_rate: 4e-05 - train_batch_size: 2 - eval_batch_size: 4 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant - num_epochs: 1.0 ### Resource Usage Peak GPU Memory: 4.0814 GB ### Eval-Phase Metrics | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | **teacher eval** | | 30.2385 | 57.2728 | | | | | 18.1772 | | 0 | 0 | 56994.4609 | 58386.3438 | 333144.0625 | 21.6098 | 46.275 | 11.569 | 60802.0039 | | 500 | 0.0101 | 2099.8235 | 11678.0371 | 13260.6084 | 21.3836 | 46.765 | 11.691 | 69590.5391 | | 1000 | 0.0202 | 1574.0366 | 8011.8809 | 10600.8320 | 21.2508 | 47.057 | 11.764 | 52850.8906 | | 1500 | 0.0303 | 1301.3883 | 6674.1611 | 10162.4316 | 21.459 | 46.601 | 11.65 | 34488.375 | | 2000 | 0.0404 | 1113.5813 | 5583.1753 | 9478.5283 | 21.4684 | 46.58 | 11.645 | 27443.7676 | | 2500 | 0.0505 | 1004.2922 | 5359.0864 | 9228.7998 | 21.3125 | 46.921 | 11.73 | 26546.2461 | | 3000 | 0.0606 | 914.3858 | 4987.7397 | 9218.7520 | 21.3671 | 46.801 | 11.7 | 13178.9082 | | 3500 | 0.0707 | 860.5787 | 4993.3696 | 8780.2881 | 21.3231 | 46.898 | 11.724 | 20241.6133 | | 4000 | 0.0808 | 810.8665 | 4433.4043 | 8697.4404 | 21.2626 | 47.031 | 11.758 | 18648.1777 | | 4500 | 0.0909 | 769.4886 | 4542.2461 | 8522.4639 | 21.4471 | 46.626 | 11.657 | 14555.5088 | | 5000 | 0.1010 | 741.9254 | 4665.9185 | 8346.4316 | 21.1682 | 47.241 | 11.81 | 10137.9199 | | 5500 | 0.1111 | 714.7664 | 4329.6104 | 8166.9438 | 21.4303 | 46.663 | 11.666 | 13222.1006 | | 6000 | 0.1212 | 692.0859 | 4471.0703 | 8177.6001 | 21.4078 | 46.712 | 11.678 | 10649.9824 | | 6500 | 0.1313 | 659.9261 | 4580.1948 | 8073.7598 | 21.198 | 47.174 | 11.794 | 12113.9268 | | 7000 | 0.1414 | 636.1021 | 4219.9077 | 7905.0562 | 21.2741 | 47.005 | 11.751 | 11793.8877 | | 7500 | 0.1515 | 623.1702 | 4116.0293 | 7826.2402 | 21.2569 | 47.044 | 11.761 | 11638.9893 | | 8000 | 0.1616 | 614.8783 | 4148.5176 | 7826.7520 | 21.2964 | 46.956 | 11.739 | 13476.1084 | | 8500 | 0.1717 | 601.8520 | 4003.9678 | 7738.1118 | 21.4281 | 46.668 | 11.667 | 11412.7490 | | 9000 | 0.1818 | 580.8234 | 3757.6580 | 7625.6001 | 21.6505 | 46.188 | 11.547 | 6242.6709 | | 9500 | 0.1919 | 579.5842 | 3891.8010 | 7658.3999 | 21.5573 | 46.388 | 11.597 | 6702.2964 | | 10000 | 0.2020 | 563.3217 | 3843.6697 | 7557.6641 | 21.4934 | 46.526 | 11.631 | 6892.9072 | | 10500 | 0.2121 | 554.2101 | 3611.4167 | 7487.4878 | 21.5876 | 46.323 | 11.581 | 6533.5151 | | 11000 | 0.2222 | 533.5391 | 3924.4539 | 7479.3599 | 21.5677 | 46.366 | 11.591 | 4041.2058 | | 11500 | 0.2323 | 539.1932 | 3840.4197 | 7422.5601 | 21.3741 | 46.786 | 11.696 | 2984.6418 | | 12000 | 0.2424 | 530.1937 | 3717.7319 | 7437.3760 | 21.5909 | 46.316 | 11.579 | 4198.5317 | | 12500 | 0.2525 | 517.9953 | 3501.5972 | 7306.0801 | 21.5337 | 46.439 | 11.61 | 3271.1892 | | 13000 | 0.2626 | 515.5474 | 3430.8489 | 7287.4878 | 21.6197 | 46.254 | 11.564 | 4228.9199 | | 13500 | 0.2727 | 516.9103 | 3583.5176 | 7331.5840 | 21.6005 | 46.295 | 11.574 | 6539.6245 | | 14000 | 0.2828 | 496.1355 | 3821.2432 | 7329.7920 | 21.4982 | 46.516 | 11.629 | 5327.2339 | | 14500 | 0.2929 | 498.4330 | 3740.2107 | 7232.8960 | 21.5819 | 46.335 | 11.584 | 5059.5977 | | 15000 | 0.3030 | 495.7023 | 3717.9944 | 7149.5361 | 21.4158 | 46.694 | 11.674 | 2332.2563 | | 15500 | 0.3131 | 491.6768 | 3593.3838 | 7156.4482 | 21.2342 | 47.094 | 11.773 | 3195.2048 | | 16000 | 0.3232 | 483.2642 | 3478.8335 | 7121.9521 | 21.2238 | 47.117 | 11.779 | 3729.5500 | | 16500 | 0.3333 | 477.9181 | 3424.2036 | 7113.9839 | 21.3606 | 46.815 | 11.704 | 4778.8506 | | 17000 | 0.3434 | 473.8991 | 3581.3721 | 7150.6240 | 21.2836 | 46.985 | 11.746 | 2268.9734 | | 17500 | 0.3535 | 471.4035 | 3375.7810 | 7056.4482 | 21.4184 | 46.689 | 11.672 | 2958.4526 | | 18000 | 0.3636 | 466.1978 | 3323.2354 | 7070.1118 | 21.3173 | 46.91 | 11.728 | 3852.8152 | | 18500 | 0.3737 | 464.6797 | 3391.8843 | 6952.3521 | 21.5144 | 46.481 | 11.62 | 6839.7295 | | 19000 | 0.3838 | 462.5197 | 3305.7080 | 6933.4399 | 21.3481 | 46.843 | 11.711 | 3396.2700 | | 19500 | 0.3939 | 456.2503 | 3340.5020 | 6974.1440 | 21.3181 | 46.909 | 11.727 | 4338.4556 | | 20000 | 0.4040 | 453.3807 | 3245.5469 | 6936.5439 | 21.3635 | 46.809 | 11.702 | 3513.4419 | | 20500 | 0.4141 | 453.9622 | 3146.9612 | 6961.3442 | 21.3014 | 46.945 | 11.736 | 10044.2734 | | 21000 | 0.4242 | 452.8354 | 2937.5862 | 6912.8638 | 21.428 | 46.668 | 11.667 | 4067.4631 | | 21500 | 0.4343 | 441.9103 | 2893.3921 | 6879.7119 | 21.3113 | 46.923 | 11.731 | 5412.9268 | | 22000 | 0.4444 | 445.0268 | 2878.3350 | 6833.9839 | 21.5124 | 46.485 | 11.621 | 3586.4441 | | 22500 | 0.4545 | 433.9949 | 3140.9766 | 6801.0562 | 21.4889 | 46.536 | 11.634 | 4264.9297 | | 23000 | 0.4646 | 432.1537 | 3241.2009 | 6835.2002 | 21.4958 | 46.521 | 11.63 | 7089.4131 | | 23500 | 0.4747 | 438.6622 | 3099.2891 | 6846.0479 | 21.3978 | 46.734 | 11.683 | 2764.0474 | | 24000 | 0.4848 | 434.6780 | 3037.6338 | 6746.4639 | 21.4299 | 46.664 | 11.666 | 6095.2222 | | 24500 | 0.4949 | 433.0188 | 3190.7532 | 6871.6479 | 21.4752 | 46.565 | 11.641 | 6818.7515 | | 25000 | 0.5051 | 424.1827 | 2884.4297 | 6806.0479 | 21.2002 | 47.169 | 11.792 | 5655.6611 | | 25500 | 0.5152 | 427.9544 | 2899.9268 | 6739.4878 | 21.4326 | 46.658 | 11.664 | 10928.7627 | | 26000 | 0.5253 | 418.4491 | 2792.2812 | 6741.0562 | 21.4399 | 46.642 | 11.661 | 4652.5972 | | 26500 | 0.5354 | 420.5338 | 2771.0999 | 6723.6162 | 21.5377 | 46.43 | 11.608 | 5530.9321 | | 27000 | 0.5455 | 414.0452 | 2715.1108 | 6704.3521 | 21.8117 | 45.847 | 11.462 | 4411.1870 | | 27500 | 0.5556 | 405.4073 | 2623.3743 | 6684.0 | 21.6362 | 46.219 | 11.555 | 4443.4106 | | 28000 | 0.5657 | 410.8664 | 2691.8567 | 6677.0562 | 21.5795 | 46.34 | 11.585 | 1948.9584 | | 28500 | 0.5758 | 418.1162 | 2795.4333 | 6772.7041 | 21.5011 | 46.509 | 11.627 | 2152.1055 | | 29000 | 0.5859 | 407.0003 | 2837.7319 | 6612.7358 | 21.6658 | 46.156 | 11.539 | 2232.7546 | | 29500 | 0.5960 | 407.4271 | 2949.1045 | 6649.2158 | 21.6025 | 46.291 | 11.573 | 3101.2493 | | 30000 | 0.6061 | 406.1163 | 2778.8286 | 6607.7759 | 21.5146 | 46.48 | 11.62 | 3840.7419 | | 30500 | 0.6162 | 397.9757 | 2956.0779 | 6601.0562 | 21.4872 | 46.539 | 11.635 | 2564.0315 | | 31000 | 0.6263 | 398.2077 | 2838.1323 | 6594.9121 | 22.1693 | 45.107 | 11.277 | 2501.1306 | | 31500 | 0.6364 | 393.3900 | 2667.1082 | 6559.9360 | 21.4915 | 46.53 | 11.633 | 5743.9526 | | 32000 | 0.6465 | 393.8561 | 2583.0869 | 6566.1758 | 21.5166 | 46.476 | 11.619 | 8028.9990 | | 32500 | 0.6566 | 391.7058 | 2675.8672 | 6583.2002 | 21.6273 | 46.238 | 11.559 | 5334.7124 | | 33000 | 0.6667 | 396.9419 | 2743.4949 | 6698.2402 | 21.5042 | 46.503 | 11.626 | 11934.8896 | | 33500 | 0.6768 | 388.6004 | 2891.6582 | 6570.7520 | 21.2945 | 46.961 | 11.74 | 4139.7988 | | 34000 | 0.6869 | 386.5763 | 2826.3506 | 6525.6318 | 21.3684 | 46.798 | 11.7 | 3156.8203 | | 34500 | 0.6970 | 387.0721 | 2805.7012 | 6572.9600 | 21.2897 | 46.971 | 11.743 | 2896.1072 | | 35000 | 0.7071 | 386.0813 | 2637.3757 | 6580.5439 | 21.2409 | 47.079 | 11.77 | 7566.7905 | | 35500 | 0.7172 | 381.5364 | 3025.4507 | 6588.3198 | 21.5446 | 46.415 | 11.604 | 4902.9575 | | 36000 | 0.7273 | 386.6814 | 2880.9741 | 6570.8481 | 21.3516 | 46.835 | 11.709 | 3154.9243 | | 36500 | 0.7374 | 379.9471 | 2795.0400 | 6521.5679 | 21.4418 | 46.638 | 11.659 | 3810.8567 | | 37000 | 0.7475 | 383.0058 | 2805.8992 | 6537.6641 | 21.3615 | 46.813 | 11.703 | 5655.2837 | | 37500 | 0.7576 | 375.7296 | 2787.7578 | 6456.9922 | 21.3662 | 46.803 | 11.701 | 3055.8257 | | 38000 | 0.7677 | 374.0701 | 2868.8132 | 6484.3198 | 21.3768 | 46.78 | 11.695 | 2952.7307 | | 38500 | 0.7778 | 377.5502 | 2659.9729 | 6455.3921 | 21.3661 | 46.803 | 11.701 | 3218.3279 | | 39000 | 0.7879 | 370.5863 | 2806.0972 | 6473.3120 | 21.2561 | 47.045 | 11.761 | 2280.2119 | | 39500 | 0.7980 | 371.9195 | 2613.6814 | 6536.6719 | 21.3516 | 46.835 | 11.709 | 2672.7583 | | 40000 | 0.8081 | 377.1619 | 2487.1150 | 6439.7441 | 21.4296 | 46.664 | 11.666 | 2315.8076 | | 40500 | 0.8182 | 370.4856 | 2678.1318 | 6437.2798 | 21.3153 | 46.915 | 11.729 | 1819.0656 | | 41000 | 0.8283 | 369.2075 | 2614.6948 | 6462.3999 | 21.4041 | 46.72 | 11.68 | 2854.2568 | | 41500 | 0.8384 | 372.8739 | 2305.3298 | 6431.2002 | 21.4425 | 46.636 | 11.659 | 3267.0427 | | 42000 | 0.8485 | 368.2697 | 2281.5596 | 6418.3042 | 21.2858 | 46.98 | 11.745 | 2240.3704 | | 42500 | 0.8586 | 365.9109 | 2410.3772 | 6468.8638 | 21.4759 | 46.564 | 11.641 | 3584.7686 | | 43000 | 0.8687 | 367.1704 | 2442.8845 | 6401.3760 | 21.5525 | 46.398 | 11.6 | 2345.6868 | | 43500 | 0.8788 | 363.9908 | 2523.0574 | 6458.4961 | 21.7663 | 45.943 | 11.486 | 3812.3833 | | 44000 | 0.8889 | 363.7012 | 2468.5098 | 6388.8638 | 21.7639 | 45.948 | 11.487 | 4788.1108 | | 44500 | 0.8990 | 363.1368 | 2572.5454 | 6479.6479 | 21.67 | 46.147 | 11.537 | 3193.9253 | | 45000 | 0.9091 | 356.2796 | 2622.3564 | 6405.2158 | 21.6556 | 46.177 | 11.544 | 1944.5388 | | 45500 | 0.9192 | 360.0483 | 2560.6021 | 6401.0239 | 21.3614 | 46.813 | 11.703 | 6363.8784 | | 46000 | 0.9293 | 358.6112 | 2230.1096 | 6385.6958 | 21.3445 | 46.85 | 11.713 | 2245.4624 | | 46500 | 0.9394 | 359.0361 | 2364.5928 | 6378.6558 | 21.4319 | 46.659 | 11.665 | 2161.8982 | | 47000 | 0.9495 | 356.5909 | 2449.0066 | 6407.8081 | 21.4857 | 46.543 | 11.636 | 3063.7917 | | 47500 | 0.9596 | 359.0292 | 2401.2183 | 6344.3521 | 21.5028 | 46.505 | 11.626 | 3229.5225 | | 48000 | 0.9697 | 359.6570 | 2497.3064 | 6563.9038 | 21.3228 | 46.898 | 11.725 | 3209.3140 | | 48500 | 0.9798 | 353.2013 | 2481.0728 | 6333.3442 | 21.4465 | 46.628 | 11.657 | 2960.4282 | | 49000 | 0.9899 | 355.4300 | 2554.2913 | 6356.8638 | 21.2635 | 47.029 | 11.757 | 3479.5901 | | 49500 | 1.0 | 352.3520 | 2577.0833 | 6367.2959 | 21.3211 | 46.902 | 11.725 | 3190.5127 | ### Framework versions - Distily 0.2.0 - Transformers 4.44.0 - Pytorch 2.3.0 - Datasets 2.20.0