pszemraj commited on
Commit
7e2d3d3
1 Parent(s): d5e0992

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -227,7 +227,7 @@ A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/
227
 
228
  A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
229
 
230
- - 20+ epochs of fine-tuning from the base model on V100/A100 GPUs
231
  - all training used 16384 token input / 1024 max output
232
 
233
  Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
@@ -262,9 +262,9 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
262
 
263
  ## Training and evaluation data
264
 
265
- `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209). Summaries longer than 1024 LongT5 tokens were filtered out with the intent of preventing the model from learning to generate "partial" summaries.
266
 
267
- _NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens** and then trained further for at least five epochs._
268
 
269
  ## Training procedure
270
 
@@ -274,9 +274,9 @@ _NOTE: early checkpoints of this model were trained on a "smaller" subsection of
274
 
275
  ### Training hyperparameters
276
 
277
- The following hyperparameters were used during the **final** training round\*:
278
 
279
- - learning_rate: 0.001
280
  - train_batch_size: 1
281
  - eval_batch_size: 1
282
  - seed: 42
@@ -289,7 +289,7 @@ The following hyperparameters were used during the **final** training round\*:
289
  - num_epochs: 2
290
 
291
 
292
- \*_Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train_
293
 
294
  ### Training results
295
 
 
227
 
228
  A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
229
 
230
+ - 30+ epochs of fine-tuning from the base model on V100/A100 GPUs
231
  - all training used 16384 token input / 1024 max output
232
 
233
  Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
 
262
 
263
  ## Training and evaluation data
264
 
265
+ `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209). Summaries longer than 1024 LongT5 tokens were filtered out to prevent the model from learning to generate "partial" summaries.
266
 
267
+ _NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens** and then trained further for 10+ epochs._
268
 
269
  ## Training procedure
270
 
 
274
 
275
  ### Training hyperparameters
276
 
277
+ The following hyperparameters were used during the **most recent** training round\*:
278
 
279
+ - learning_rate: 0.0006
280
  - train_batch_size: 1
281
  - eval_batch_size: 1
282
  - seed: 42
 
289
  - num_epochs: 2
290
 
291
 
292
+ \*_Prior training sessions used roughly similar parameters; multiple sessions were required as this takes aeons to train_
293
 
294
  ### Training results
295