pszemraj commited on
Commit
fa508cb
1 Parent(s): 768a7be
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -67,7 +67,7 @@ inference:
67
  # long-t5-tglobal-base-16384-booksum
68
 
69
  - summarize long text and get a SparkNotes-esque summary of arbitrary topics!
70
- - generalizes fairly well to academic & narrative text.
71
 
72
  ## Cheeky Proof-of-Concept
73
 
@@ -80,9 +80,9 @@ A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/
80
 
81
  A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
82
 
83
- - between different checkpoints, about 20 epochs in total
84
- - all training was done at 16384 token input / 1024 max output
85
- - early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens**, and then trained further for at least five epochs.
86
 
87
  ## Intended uses & limitations
88
 
@@ -92,7 +92,7 @@ A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/goo
92
 
93
  ## Training and evaluation data
94
 
95
- `kmfoda/booksum` dataset. Summaries longer than 1024 LongT5 tokens were filtered out to prevent the model from learning to generate "partial" summaries.
96
 
97
  ## Training procedure
98
 
@@ -111,7 +111,7 @@ The following hyperparameters were used during the **final** training round\*:
111
  - lr_scheduler_warmup_ratio: 0.02
112
  - num_epochs: 2
113
 
114
- \*_Prior training sessions used roughly similar parameters, multiple sessions were required as this takes eons to train_
115
 
116
  ### Training results
117
 
 
67
  # long-t5-tglobal-base-16384-booksum
68
 
69
  - summarize long text and get a SparkNotes-esque summary of arbitrary topics!
70
+ - generalizes reasonably well to academic & narrative text.
71
 
72
  ## Cheeky Proof-of-Concept
73
 
 
80
 
81
  A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
82
 
83
+ - 20+ epochs of fine-tuning from the base model on V100/A100 GPUs
84
+ - all training used 16384 token input / 1024 max output
85
+ - early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens**and then trained further for at least five epochs.
86
 
87
  ## Intended uses & limitations
88
 
 
92
 
93
  ## Training and evaluation data
94
 
95
+ `kmfoda/booksum` dataset. Summaries longer than 1024 LongT5 tokens were filtered out with the intent of preventing the model from learning to generate "partial" summaries.
96
 
97
  ## Training procedure
98
 
 
111
  - lr_scheduler_warmup_ratio: 0.02
112
  - num_epochs: 2
113
 
114
+ \*_Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train_
115
 
116
  ### Training results
117