pszemraj commited on
Commit
742d2d9
1 Parent(s): fa508cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -3
README.md CHANGED
@@ -82,18 +82,21 @@ A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/goo
82
 
83
  - 20+ epochs of fine-tuning from the base model on V100/A100 GPUs
84
  - all training used 16384 token input / 1024 max output
85
- - early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens**and then trained further for at least five epochs.
86
 
87
  ## Intended uses & limitations
88
 
89
  - At the time of writing, the model is not _fully converged_ despite training for 20+ epochs. This checkpoint is serviceable enough (see examples).
90
- - I plan to update this page with newer checkpoints and post some metrics over time.
91
- - Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
 
92
 
93
  ## Training and evaluation data
94
 
95
  `kmfoda/booksum` dataset. Summaries longer than 1024 LongT5 tokens were filtered out with the intent of preventing the model from learning to generate "partial" summaries.
96
 
 
 
97
  ## Training procedure
98
 
99
  ### Training hyperparameters
 
82
 
83
  - 20+ epochs of fine-tuning from the base model on V100/A100 GPUs
84
  - all training used 16384 token input / 1024 max output
85
+
86
 
87
  ## Intended uses & limitations
88
 
89
  - At the time of writing, the model is not _fully converged_ despite training for 20+ epochs. This checkpoint is serviceable enough (see examples).
90
+ - I plan to update this page with newer checkpoints and post some metrics over time.
91
+ - Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
92
+ - while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
93
 
94
  ## Training and evaluation data
95
 
96
  `kmfoda/booksum` dataset. Summaries longer than 1024 LongT5 tokens were filtered out with the intent of preventing the model from learning to generate "partial" summaries.
97
 
98
+ > - early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens** and then trained further for at least five epochs.
99
+
100
  ## Training procedure
101
 
102
  ### Training hyperparameters