pszemraj commited on
Commit
ad46a6b
1 Parent(s): 2bd108e
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -82,11 +82,11 @@ A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/goo
82
 
83
  - between different checkpoints, about 20 epochs in total
84
  - all training was done at 16384 token input / 1024 max output
85
- - early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of 1024 **characters**. This was subsequently caught and adjusted to **1024** tokens, and then trained further for at least five epochs.
86
 
87
  ## Intended uses & limitations
88
 
89
- - At time of writing, the model is not _fully converged_ despite training for 20+ epochs. This checkpoint is serviceable enough (see examples).
90
  - I plan to update this page with newer checkpoints and post some metrics over time.
91
  - Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset.
92
 
@@ -98,7 +98,7 @@ A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/goo
98
 
99
  ### Training hyperparameters
100
 
101
- The following hyperparameters were used during the final training round:
102
  - learning_rate: 0.0004
103
  - train_batch_size: 2
104
  - eval_batch_size: 1
@@ -111,6 +111,8 @@ The following hyperparameters were used during the final training round:
111
  - lr_scheduler_warmup_ratio: 0.02
112
  - num_epochs: 2
113
 
 
 
114
  ### Training results
115
 
116
 
 
82
 
83
  - between different checkpoints, about 20 epochs in total
84
  - all training was done at 16384 token input / 1024 max output
85
+ - early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens**, and then trained further for at least five epochs.
86
 
87
  ## Intended uses & limitations
88
 
89
+ - At the time of writing, the model is not _fully converged_ despite training for 20+ epochs. This checkpoint is serviceable enough (see examples).
90
  - I plan to update this page with newer checkpoints and post some metrics over time.
91
  - Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset.
92
 
 
98
 
99
  ### Training hyperparameters
100
 
101
+ The following hyperparameters were used during the **final** training round\*:
102
  - learning_rate: 0.0004
103
  - train_batch_size: 2
104
  - eval_batch_size: 1
 
111
  - lr_scheduler_warmup_ratio: 0.02
112
  - num_epochs: 2
113
 
114
+ \*_Prior training sessions used roughly similar parameters, multiple sessions were required as this takes eons to train
115
+
116
  ### Training results
117
 
118