Edit model card

checkpoints

This model is a fine-tuned version of google/pegasus-large on the booksum dataset.

Model description

More information needed

Intended uses & limitations

  • standard pegasus has a max input length of 1024 tokens, therefore the model only saw the first 1024 tokens of a chapter when training, and learned to try to make the chapter's summary from that. Keep this in mind when using this model, as information at the end of a text sequence longer than 1024 tokens may be excluded from the final summary/the model will be biased towards information presented first.
  • this was only trained on the dataset for an epoch but still provides reasonable results.

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1

Training results

Framework versions

  • Transformers 4.16.1
  • Pytorch 1.10.0+cu111
  • Datasets 1.18.2
  • Tokenizers 0.10.3
Downloads last month
5
Safetensors
Model size
569M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train pszemraj/pegasus-large-book-summary