End of training
Browse files- README.md +25 -37
- logs/events.out.tfevents.1724126966.02dbb11e2dcc +3 -0
- logs/events.out.tfevents.1724131158.02dbb11e2dcc +3 -0
- logs/learning_rate=0.0001, per_device_train_batch_size=4/completed.flag +0 -0
- logs/learning_rate=0.0001, per_device_train_batch_size=4/events.out.tfevents.1724126717.02dbb11e2dcc +2 -2
- model.safetensors +1 -1
- training_args.bin +2 -2
README.md
CHANGED
@@ -16,14 +16,14 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
|
|
16 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
17 |
|
18 |
It achieves the following results on the evaluation set:
|
19 |
-
- eval_enwikippl:
|
20 |
-
- eval_frwikippl:
|
21 |
-
- eval_zhwikippl:
|
22 |
-
- eval_tinystoriesppl:
|
23 |
-
- eval_loss:
|
24 |
-
- eval_runtime: 16.
|
25 |
-
- eval_samples_per_second: 59.
|
26 |
-
- eval_steps_per_second: 7.
|
27 |
|
28 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
29 |
should probably proofread and complete it, then remove this comment.
|
@@ -48,8 +48,8 @@ More information needed
|
|
48 |
The following hyperparameters were used during training:
|
49 |
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
|
50 |
- train_embeddings: True
|
51 |
-
- learning_rate: 0.
|
52 |
-
- train_batch_size:
|
53 |
- eval_batch_size: 8
|
54 |
- seed: 42
|
55 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
@@ -58,38 +58,26 @@ The following hyperparameters were used during training:
|
|
58 |
- num_epochs: 1.0
|
59 |
|
60 |
### Resource Usage
|
61 |
-
Peak GPU Memory: 7.
|
62 |
|
63 |
### Eval-Phase Metrics
|
64 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
65 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
66 |
| **teacher eval** | | 43.75 | 61.75 | | | | | 11.8125 | 19.125 |
|
67 |
-
| 0 | 0 |
|
68 |
-
| 1000 | 0.
|
69 |
-
| 2000 | 0.
|
70 |
-
| 3000 | 0.
|
71 |
-
| 4000 | 0.
|
72 |
-
| 5000 | 0.
|
73 |
-
| 6000 | 0.
|
74 |
-
| 7000 | 0.
|
75 |
-
| 8000 | 0.
|
76 |
-
| 9000 | 0.
|
77 |
-
| 10000 | 0.
|
78 |
-
| 11000 | 0.
|
79 |
-
| 12000 | 0.
|
80 |
-
|
|
81 |
-
| 14000 | 0.5657 | 68.5 | 237.0 | 0.5389 | 16.7317 | 59.767 | 7.471 | 54.75 | 286.0 |
|
82 |
-
| 15000 | 0.6061 | 67.5 | 252.0 | 0.5187 | 16.7326 | 59.764 | 7.47 | 52.25 | 98.5 |
|
83 |
-
| 16000 | 0.6465 | 69.0 | 235.0 | 0.5174 | 16.8095 | 59.49 | 7.436 | 54.75 | 125.5 |
|
84 |
-
| 17000 | 0.6869 | 67.0 | 231.0 | 0.5048 | 16.7326 | 59.764 | 7.47 | 50.5 | 116.0 |
|
85 |
-
| 18000 | 0.7273 | 66.0 | 225.0 | 0.4909 | 16.7575 | 59.675 | 7.459 | 49.75 | 132.0 |
|
86 |
-
| 19000 | 0.7677 | 66.5 | 247.0 | 0.4894 | 16.8313 | 59.413 | 7.427 | 49.75 | 112.0 |
|
87 |
-
| 20000 | 0.8081 | 66.5 | 233.0 | 0.4870 | 16.7365 | 59.75 | 7.469 | 51.5 | 103.5 |
|
88 |
-
| 21000 | 0.8485 | 65.0 | 221.0 | 0.4831 | 16.703 | 59.869 | 7.484 | 50.75 | 181.0 |
|
89 |
-
| 22000 | 0.8889 | 65.5 | 199.0 | 0.4740 | 16.7629 | 59.656 | 7.457 | 49.5 | 95.5 |
|
90 |
-
| 23000 | 0.9293 | 67.0 | 223.0 | 0.4752 | 16.7201 | 59.808 | 7.476 | 46.5 | 174.0 |
|
91 |
-
| 24000 | 0.9697 | 65.0 | 207.0 | 0.4700 | 16.8026 | 59.515 | 7.439 | 46.75 | 98.5 |
|
92 |
-
| 24750 | 1.0 | 67.0 | 207.0 | 0.4672 | 16.7876 | 59.568 | 7.446 | 47.0 | 185.0 |
|
93 |
|
94 |
### Framework versions
|
95 |
- Distily 0.2.0
|
|
|
16 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
17 |
|
18 |
It achieves the following results on the evaluation set:
|
19 |
+
- eval_enwikippl: 2192.0
|
20 |
+
- eval_frwikippl: 11200.0
|
21 |
+
- eval_zhwikippl: 93184.0
|
22 |
+
- eval_tinystoriesppl: 1808.0
|
23 |
+
- eval_loss: 2.6293
|
24 |
+
- eval_runtime: 16.9228
|
25 |
+
- eval_samples_per_second: 59.092
|
26 |
+
- eval_steps_per_second: 7.386
|
27 |
|
28 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
29 |
should probably proofread and complete it, then remove this comment.
|
|
|
48 |
The following hyperparameters were used during training:
|
49 |
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
|
50 |
- train_embeddings: True
|
51 |
+
- learning_rate: 0.0004
|
52 |
+
- train_batch_size: 8
|
53 |
- eval_batch_size: 8
|
54 |
- seed: 42
|
55 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
|
|
58 |
- num_epochs: 1.0
|
59 |
|
60 |
### Resource Usage
|
61 |
+
Peak GPU Memory: 7.9368 GB
|
62 |
|
63 |
### Eval-Phase Metrics
|
64 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
65 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
66 |
| **teacher eval** | | 43.75 | 61.75 | | | | | 11.8125 | 19.125 |
|
67 |
+
| 0 | 0 | 2473901162496.0 | 170424302305280.0 | 20.7680 | 16.794 | 59.545 | 7.443 | 4060086272.0 | 71468255805440.0 |
|
68 |
+
| 1000 | 0.0808 | 688.0 | 3728.0 | 1.9530 | 16.821 | 59.449 | 7.431 | 652.0 | 2784.0 |
|
69 |
+
| 2000 | 0.1616 | 1728.0 | 8256.0 | 2.4948 | 16.7878 | 59.567 | 7.446 | 1384.0 | 35584.0 |
|
70 |
+
| 3000 | 0.2424 | 2040.0 | 10112.0 | 2.6087 | 16.7522 | 59.694 | 7.462 | 1720.0 | 64256.0 |
|
71 |
+
| 4000 | 0.3232 | 2160.0 | 9280.0 | 2.6353 | 16.796 | 59.538 | 7.442 | 1816.0 | 57088.0 |
|
72 |
+
| 5000 | 0.4040 | 1904.0 | 9088.0 | 2.5782 | 16.8206 | 59.451 | 7.431 | 1848.0 | 61440.0 |
|
73 |
+
| 6000 | 0.4848 | 1840.0 | 8960.0 | 2.5344 | 16.7618 | 59.659 | 7.457 | 1592.0 | 69120.0 |
|
74 |
+
| 7000 | 0.5657 | 1808.0 | 8512.0 | 2.5269 | 16.7913 | 59.555 | 7.444 | 1648.0 | 60672.0 |
|
75 |
+
| 8000 | 0.6465 | 2096.0 | 8960.0 | 2.6404 | 16.8233 | 59.442 | 7.43 | 1928.0 | 137216.0 |
|
76 |
+
| 9000 | 0.7273 | 2192.0 | 11200.0 | 2.6293 | 16.9228 | 59.092 | 7.386 | 1808.0 | 93184.0 |
|
77 |
+
| 10000 | 0.8081 | 1944.0 | 9984.0 | 2.5759 | 16.857 | 59.323 | 7.415 | 1568.0 | 80896.0 |
|
78 |
+
| 11000 | 0.8889 | 1736.0 | 9344.0 | 2.5147 | 16.8438 | 59.369 | 7.421 | 1488.0 | 48640.0 |
|
79 |
+
| 12000 | 0.9697 | 2224.0 | 11840.0 | 2.6633 | 16.7839 | 59.581 | 7.448 | 1968.0 | 98816.0 |
|
80 |
+
| 12375 | 1.0 | 2432.0 | 11072.0 | 2.7197 | 16.7952 | 59.541 | 7.443 | 2176.0 | 109568.0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
81 |
|
82 |
### Framework versions
|
83 |
- Distily 0.2.0
|
logs/events.out.tfevents.1724126966.02dbb11e2dcc
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1a250b37c7d93dabf10df915a05213ea7ecac8c305e6e3e657b5bea22a7f6668
|
3 |
+
size 5852906
|
logs/events.out.tfevents.1724131158.02dbb11e2dcc
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:12278b9d145fd2c6847dff7dea295578db7bd43aee4be002f23b4f56cd9ce1a0
|
3 |
+
size 307
|
logs/learning_rate=0.0001, per_device_train_batch_size=4/completed.flag
ADDED
File without changes
|
logs/learning_rate=0.0001, per_device_train_batch_size=4/events.out.tfevents.1724126717.02dbb11e2dcc
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7192f9885af4cdb073e64f21a7c6b3df8ff55f2bf1f86e0b27a6ced595b7111e
|
3 |
+
size 588
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 248894656
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:776f3531eec7b3712662c7d587fe16cf37bc93e8816939f74bf1498055406a03
|
3 |
size 248894656
|
training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7777a3b236d5a2940cd4ae7de66e1e80e17576a70be7777d54114b4ecf4ff248
|
3 |
+
size 1017899080
|