lapp0 commited on
Commit
5a00a05
1 Parent(s): 8597334

End of training

Browse files
README.md CHANGED
@@ -14,11 +14,9 @@ model-index:
14
 
15
  # Summary
16
 
17
- **`distily_modelcard_try`**
18
-
19
  Distilled with [Distily](https://github.com/lapp0/distily) library
20
- from teacher model [gpt2](https://huggingface.co/gpt2)
21
- using dataset [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia).
22
 
23
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
24
  should probably proofread and complete it, then remove this comment.
@@ -44,8 +42,8 @@ More information needed
44
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
45
  | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
46
  | **teacher eval** | | 43.75 | 61.75 | | | | | 11.8125 | 19.125 |
47
- | 0 | 0 | 1614907703296.0 | 77515569758208.0 | 20.125 | 0.1511 | 13.234 | 6.617 | 8053063680.0 | 50027779063808.0 |
48
- | 20 | 1.0 | 15744.0 | 128000.0 | 5.2188 | 0.0813 | 24.592 | 12.296 | 4896.0 | 13238272.0 |
49
 
50
  # Resource Usage Comparison
51
 
@@ -91,7 +89,7 @@ More information needed
91
  <br/>
92
 
93
  # Train Dataset
94
- Trained on 150,108 tokens from the [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset.
95
 
96
  - Num Samples: `158`
97
  - Subset: `20231101.en`
@@ -120,7 +118,7 @@ The following hyperparameters were used during training:
120
  - num_epochs: `1.0`
121
  - distillation_objective: `DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl))`
122
  - train_embeddings: `True`
123
- - lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at 0x7f4b48232590>`
124
  - student_model_name_or_path: `None`
125
  - student_config_name_or_path: `None`
126
  - student_model_config: `None`
 
14
 
15
  # Summary
16
 
 
 
17
  Distilled with [Distily](https://github.com/lapp0/distily) library
18
+ using teacher model [gpt2](https://huggingface.co/gpt2)
19
+ on dataset [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia).
20
 
21
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
22
  should probably proofread and complete it, then remove this comment.
 
42
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
43
  | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
44
  | **teacher eval** | | 43.75 | 61.75 | | | | | 11.8125 | 19.125 |
45
+ | 0 | 0 | 949187772416.0 | 76416058130432.0 | 21.75 | 0.1221 | 16.381 | 8.191 | 3556769792.0 | 13950053777408.0 |
46
+ | 20 | 1.0 | 13248.0 | 64000.0 | 5.6562 | 0.0646 | 30.969 | 15.485 | 7712.0 | 181248.0 |
47
 
48
  # Resource Usage Comparison
49
 
 
89
  <br/>
90
 
91
  # Train Dataset
92
+ Trained on 149,632 tokens from the [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset.
93
 
94
  - Num Samples: `158`
95
  - Subset: `20231101.en`
 
118
  - num_epochs: `1.0`
119
  - distillation_objective: `DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl))`
120
  - train_embeddings: `True`
121
+ - lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at 0x7f80845a7190>`
122
  - student_model_name_or_path: `None`
123
  - student_config_name_or_path: `None`
124
  - student_model_config: `None`
logs/per_device_train_batch_size=8/events.out.tfevents.1724261529.f383272e719b ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:858391dfb337f6b823a7991ca2c3dc43b81d7edf59d31deb67e3e95eb4a2793c
3
+ size 302