End of training
Browse files
README.md
CHANGED
@@ -14,11 +14,9 @@ model-index:
|
|
14 |
|
15 |
# Summary
|
16 |
|
17 |
-
**`distily_modelcard_try`**
|
18 |
-
|
19 |
Distilled with [Distily](https://github.com/lapp0/distily) library
|
20 |
-
|
21 |
-
|
22 |
|
23 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
24 |
should probably proofread and complete it, then remove this comment.
|
@@ -44,8 +42,8 @@ More information needed
|
|
44 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
45 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
46 |
| **teacher eval** | | 43.75 | 61.75 | | | | | 11.8125 | 19.125 |
|
47 |
-
| 0 | 0 |
|
48 |
-
| 20 | 1.0 |
|
49 |
|
50 |
# Resource Usage Comparison
|
51 |
|
@@ -91,7 +89,7 @@ More information needed
|
|
91 |
<br/>
|
92 |
|
93 |
# Train Dataset
|
94 |
-
Trained on
|
95 |
|
96 |
- Num Samples: `158`
|
97 |
- Subset: `20231101.en`
|
@@ -120,7 +118,7 @@ The following hyperparameters were used during training:
|
|
120 |
- num_epochs: `1.0`
|
121 |
- distillation_objective: `DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl))`
|
122 |
- train_embeddings: `True`
|
123 |
-
- lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at
|
124 |
- student_model_name_or_path: `None`
|
125 |
- student_config_name_or_path: `None`
|
126 |
- student_model_config: `None`
|
|
|
14 |
|
15 |
# Summary
|
16 |
|
|
|
|
|
17 |
Distilled with [Distily](https://github.com/lapp0/distily) library
|
18 |
+
using teacher model [gpt2](https://huggingface.co/gpt2)
|
19 |
+
on dataset [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia).
|
20 |
|
21 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
22 |
should probably proofread and complete it, then remove this comment.
|
|
|
42 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
43 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
44 |
| **teacher eval** | | 43.75 | 61.75 | | | | | 11.8125 | 19.125 |
|
45 |
+
| 0 | 0 | 949187772416.0 | 76416058130432.0 | 21.75 | 0.1221 | 16.381 | 8.191 | 3556769792.0 | 13950053777408.0 |
|
46 |
+
| 20 | 1.0 | 13248.0 | 64000.0 | 5.6562 | 0.0646 | 30.969 | 15.485 | 7712.0 | 181248.0 |
|
47 |
|
48 |
# Resource Usage Comparison
|
49 |
|
|
|
89 |
<br/>
|
90 |
|
91 |
# Train Dataset
|
92 |
+
Trained on 149,632 tokens from the [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset.
|
93 |
|
94 |
- Num Samples: `158`
|
95 |
- Subset: `20231101.en`
|
|
|
118 |
- num_epochs: `1.0`
|
119 |
- distillation_objective: `DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl))`
|
120 |
- train_embeddings: `True`
|
121 |
+
- lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at 0x7f80845a7190>`
|
122 |
- student_model_name_or_path: `None`
|
123 |
- student_config_name_or_path: `None`
|
124 |
- student_model_config: `None`
|
logs/per_device_train_batch_size=8/events.out.tfevents.1724261529.f383272e719b
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:858391dfb337f6b823a7991ca2c3dc43b81d7edf59d31deb67e3e95eb4a2793c
|
3 |
+
size 302
|