End of training

Browse files

Files changed (7) hide show

README.md +25 -37
logs/events.out.tfevents.1724126966.02dbb11e2dcc +3 -0
logs/events.out.tfevents.1724131158.02dbb11e2dcc +3 -0
logs/learning_rate=0.0001, per_device_train_batch_size=4/completed.flag +0 -0
logs/learning_rate=0.0001, per_device_train_batch_size=4/events.out.tfevents.1724126717.02dbb11e2dcc +2 -2
model.safetensors +1 -1
training_args.bin +2 -2

README.md CHANGED Viewed

@@ -16,14 +16,14 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
-- eval_enwikippl: 84.5
-- eval_frwikippl: 356.0
-- eval_zhwikippl: 135.0
-- eval_tinystoriesppl: 72.0
-- eval_loss: 0.6795
-- eval_runtime: 16.7299
-- eval_samples_per_second: 59.773
-- eval_steps_per_second: 7.472
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
@@ -48,8 +48,8 @@ More information needed
 The following hyperparameters were used during training:
 - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
 - train_embeddings: True
-- learning_rate: 0.0001
-- train_batch_size: 4
 - eval_batch_size: 8
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
@@ -58,38 +58,26 @@ The following hyperparameters were used during training:
 - num_epochs: 1.0
 ### Resource Usage
-Peak GPU Memory: 7.4226 GB
 ### Eval-Phase Metrics
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 43.75 | 61.75 |  |  |  |  | 11.8125 | 19.125 |
-| 0 | 0 | 3126736191488.0 | 129742372077568.0 | 20.7540 | 16.7407 | 59.735 | 7.467 | 6677331968.0 | 80264348827648.0 |
-| 1000 | 0.0404 | 320.0 | 1504.0 | 1.5025 | 16.7846 | 59.579 | 7.447 | 245.0 | 280.0 |
-| 2000 | 0.0808 | 220.0 | 800.0 | 1.3040 | 16.7756 | 59.61 | 7.451 | 189.0 | 201.0 |
-| 3000 | 0.1212 | 180.0 | 648.0 | 1.1450 | 16.7863 | 59.572 | 7.447 | 153.0 | 149.0 |
-| 4000 | 0.1616 | 148.0 | 552.0 | 1.0301 | 16.7242 | 59.794 | 7.474 | 121.5 | 153.0 |
-| 5000 | 0.2020 | 129.0 | 452.0 | 0.9348 | 16.7817 | 59.589 | 7.449 | 105.0 | 176.0 |
-| 6000 | 0.2424 | 115.5 | 442.0 | 0.8587 | 16.8358 | 59.397 | 7.425 | 86.0 | 139.0 |
-| 7000 | 0.2828 | 103.0 | 432.0 | 0.8002 | 16.7689 | 59.634 | 7.454 | 78.5 | 139.0 |
-| 8000 | 0.3232 | 96.5 | 418.0 | 0.7424 | 16.7778 | 59.602 | 7.45 | 73.5 | 126.0 |
-| 9000 | 0.3636 | 84.5 | 356.0 | 0.6795 | 16.7299 | 59.773 | 7.472 | 72.0 | 135.0 |
-| 10000 | 0.4040 | 81.5 | 304.0 | 0.6324 | 16.7186 | 59.813 | 7.477 | 66.0 | 125.5 |
-| 11000 | 0.4444 | 77.5 | 282.0 | 0.5972 | 16.777 | 59.605 | 7.451 | 59.25 | 121.5 |
-| 12000 | 0.4848 | 72.5 | 288.0 | 0.5723 | 16.7347 | 59.756 | 7.47 | 56.75 | 118.0 |
-| 13000 | 0.5253 | 69.5 | 256.0 | 0.5577 | 16.7525 | 59.693 | 7.462 | 55.5 | 141.0 |
-| 14000 | 0.5657 | 68.5 | 237.0 | 0.5389 | 16.7317 | 59.767 | 7.471 | 54.75 | 286.0 |
-| 15000 | 0.6061 | 67.5 | 252.0 | 0.5187 | 16.7326 | 59.764 | 7.47 | 52.25 | 98.5 |
-| 16000 | 0.6465 | 69.0 | 235.0 | 0.5174 | 16.8095 | 59.49 | 7.436 | 54.75 | 125.5 |
-| 17000 | 0.6869 | 67.0 | 231.0 | 0.5048 | 16.7326 | 59.764 | 7.47 | 50.5 | 116.0 |
-| 18000 | 0.7273 | 66.0 | 225.0 | 0.4909 | 16.7575 | 59.675 | 7.459 | 49.75 | 132.0 |
-| 19000 | 0.7677 | 66.5 | 247.0 | 0.4894 | 16.8313 | 59.413 | 7.427 | 49.75 | 112.0 |
-| 20000 | 0.8081 | 66.5 | 233.0 | 0.4870 | 16.7365 | 59.75 | 7.469 | 51.5 | 103.5 |
-| 21000 | 0.8485 | 65.0 | 221.0 | 0.4831 | 16.703 | 59.869 | 7.484 | 50.75 | 181.0 |
-| 22000 | 0.8889 | 65.5 | 199.0 | 0.4740 | 16.7629 | 59.656 | 7.457 | 49.5 | 95.5 |
-| 23000 | 0.9293 | 67.0 | 223.0 | 0.4752 | 16.7201 | 59.808 | 7.476 | 46.5 | 174.0 |
-| 24000 | 0.9697 | 65.0 | 207.0 | 0.4700 | 16.8026 | 59.515 | 7.439 | 46.75 | 98.5 |
-| 24750 | 1.0 | 67.0 | 207.0 | 0.4672 | 16.7876 | 59.568 | 7.446 | 47.0 | 185.0 |
 ### Framework versions
 - Distily 0.2.0

 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
+- eval_enwikippl: 2192.0
+- eval_frwikippl: 11200.0
+- eval_zhwikippl: 93184.0
+- eval_tinystoriesppl: 1808.0
+- eval_loss: 2.6293
+- eval_runtime: 16.9228
+- eval_samples_per_second: 59.092
+- eval_steps_per_second: 7.386
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
 The following hyperparameters were used during training:
 - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
 - train_embeddings: True
+- learning_rate: 0.0004
+- train_batch_size: 8
 - eval_batch_size: 8
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - num_epochs: 1.0
 ### Resource Usage
+Peak GPU Memory: 7.9368 GB
 ### Eval-Phase Metrics
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 43.75 | 61.75 |  |  |  |  | 11.8125 | 19.125 |
+| 0 | 0 | 2473901162496.0 | 170424302305280.0 | 20.7680 | 16.794 | 59.545 | 7.443 | 4060086272.0 | 71468255805440.0 |
+| 1000 | 0.0808 | 688.0 | 3728.0 | 1.9530 | 16.821 | 59.449 | 7.431 | 652.0 | 2784.0 |
+| 2000 | 0.1616 | 1728.0 | 8256.0 | 2.4948 | 16.7878 | 59.567 | 7.446 | 1384.0 | 35584.0 |
+| 3000 | 0.2424 | 2040.0 | 10112.0 | 2.6087 | 16.7522 | 59.694 | 7.462 | 1720.0 | 64256.0 |
+| 4000 | 0.3232 | 2160.0 | 9280.0 | 2.6353 | 16.796 | 59.538 | 7.442 | 1816.0 | 57088.0 |
+| 5000 | 0.4040 | 1904.0 | 9088.0 | 2.5782 | 16.8206 | 59.451 | 7.431 | 1848.0 | 61440.0 |
+| 6000 | 0.4848 | 1840.0 | 8960.0 | 2.5344 | 16.7618 | 59.659 | 7.457 | 1592.0 | 69120.0 |
+| 7000 | 0.5657 | 1808.0 | 8512.0 | 2.5269 | 16.7913 | 59.555 | 7.444 | 1648.0 | 60672.0 |
+| 8000 | 0.6465 | 2096.0 | 8960.0 | 2.6404 | 16.8233 | 59.442 | 7.43 | 1928.0 | 137216.0 |
+| 9000 | 0.7273 | 2192.0 | 11200.0 | 2.6293 | 16.9228 | 59.092 | 7.386 | 1808.0 | 93184.0 |
+| 10000 | 0.8081 | 1944.0 | 9984.0 | 2.5759 | 16.857 | 59.323 | 7.415 | 1568.0 | 80896.0 |
+| 11000 | 0.8889 | 1736.0 | 9344.0 | 2.5147 | 16.8438 | 59.369 | 7.421 | 1488.0 | 48640.0 |
+| 12000 | 0.9697 | 2224.0 | 11840.0 | 2.6633 | 16.7839 | 59.581 | 7.448 | 1968.0 | 98816.0 |
+| 12375 | 1.0 | 2432.0 | 11072.0 | 2.7197 | 16.7952 | 59.541 | 7.443 | 2176.0 | 109568.0 |
 ### Framework versions
 - Distily 0.2.0

logs/events.out.tfevents.1724126966.02dbb11e2dcc ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1a250b37c7d93dabf10df915a05213ea7ecac8c305e6e3e657b5bea22a7f6668
+size 5852906

logs/events.out.tfevents.1724131158.02dbb11e2dcc ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:12278b9d145fd2c6847dff7dea295578db7bd43aee4be002f23b4f56cd9ce1a0
+size 307

logs/learning_rate=0.0001, per_device_train_batch_size=4/completed.flag ADDED Viewed

File without changes

logs/learning_rate=0.0001, per_device_train_batch_size=4/events.out.tfevents.1724126717.02dbb11e2dcc CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a4a5bc57413201154a418160e43c6fab883cbe496ab8ceedae8968250c513ada
-size 312

 version https://git-lfs.github.com/spec/v1
+oid sha256:7192f9885af4cdb073e64f21a7c6b3df8ff55f2bf1f86e0b27a6ced595b7111e
+size 588

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8d5d3a044ceaf584c050ee3384ec8ee57d7df959abf234e7bbf8033c970b2dc6
 size 248894656

 version https://git-lfs.github.com/spec/v1
+oid sha256:776f3531eec7b3712662c7d587fe16cf37bc93e8816939f74bf1498055406a03
 size 248894656

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:16aa1decdeea3f49b2565ec28903e638e5000d81c2746c93df0e3698f552931e
-size 1017899144

 version https://git-lfs.github.com/spec/v1
+oid sha256:7777a3b236d5a2940cd4ae7de66e1e80e17576a70be7777d54114b4ecf4ff248
+size 1017899080