End of training

Browse files

Files changed (2) hide show

README.md +6 -8
logs/per_device_train_batch_size=8/events.out.tfevents.1724261529.f383272e719b +3 -0

README.md CHANGED Viewed

@@ -14,11 +14,9 @@ model-index:
 # Summary
-**`distily_modelcard_try`**
 Distilled with [Distily](https://github.com/lapp0/distily) library
-from teacher model [gpt2](https://huggingface.co/gpt2)
-using dataset [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia).
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
@@ -44,8 +42,8 @@ More information needed
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
 | **teacher eval** |  | 43.75 | 61.75 |  |  |  |  | 11.8125 | 19.125 |
-| 0 | 0 | 1614907703296.0 | 77515569758208.0 | 20.125 | 0.1511 | 13.234 | 6.617 | 8053063680.0 | 50027779063808.0 |
-| 20 | 1.0 | 15744.0 | 128000.0 | 5.2188 | 0.0813 | 24.592 | 12.296 | 4896.0 | 13238272.0 |
 # Resource Usage Comparison
@@ -91,7 +89,7 @@ More information needed
 <br/>
 # Train Dataset
-Trained on 150,108 tokens from the [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset.
 - Num Samples: `158`
 - Subset: `20231101.en`
@@ -120,7 +118,7 @@ The following hyperparameters were used during training:
 - num_epochs: `1.0`
 - distillation_objective: `DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl))`
 - train_embeddings: `True`
-- lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at 0x7f4b48232590>`
 - student_model_name_or_path: `None`
 - student_config_name_or_path: `None`
 - student_model_config: `None`

 # Summary
 Distilled with [Distily](https://github.com/lapp0/distily) library
+using teacher model [gpt2](https://huggingface.co/gpt2)
+on dataset [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia).
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
 | **teacher eval** |  | 43.75 | 61.75 |  |  |  |  | 11.8125 | 19.125 |
+| 0 | 0 | 949187772416.0 | 76416058130432.0 | 21.75 | 0.1221 | 16.381 | 8.191 | 3556769792.0 | 13950053777408.0 |
+| 20 | 1.0 | 13248.0 | 64000.0 | 5.6562 | 0.0646 | 30.969 | 15.485 | 7712.0 | 181248.0 |
 # Resource Usage Comparison
 <br/>
 # Train Dataset
+Trained on 149,632 tokens from the [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset.
 - Num Samples: `158`
 - Subset: `20231101.en`
 - num_epochs: `1.0`
 - distillation_objective: `DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl))`
 - train_embeddings: `True`
+- lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at 0x7f80845a7190>`
 - student_model_name_or_path: `None`
 - student_config_name_or_path: `None`
 - student_model_config: `None`

logs/per_device_train_batch_size=8/events.out.tfevents.1724261529.f383272e719b ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:858391dfb337f6b823a7991ca2c3dc43b81d7edf59d31deb67e3e95eb4a2793c
+size 302