t5-base-mse-summarization
This model is a fine-tuned version of t5-base on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.8743
- Rouge1: 45.9597
- Rouge2: 26.8086
- Rougel: 39.935
- Rougelsum: 43.8897
- Bleurt: -0.7132
- Gen Len: 18.464
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 20
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Bleurt | Gen Len |
---|---|---|---|---|---|---|---|---|---|
1.2568 | 1.0 | 267 | 1.0472 | 41.6829 | 21.9654 | 35.4264 | 39.5556 | -0.8231 | 18.522 |
1.1085 | 2.0 | 534 | 0.9840 | 43.1479 | 23.3351 | 36.9244 | 40.886 | -0.7843 | 18.534 |
1.0548 | 3.0 | 801 | 0.9515 | 44.1511 | 24.4912 | 37.9549 | 41.9984 | -0.7702 | 18.528 |
1.0251 | 4.0 | 1068 | 0.9331 | 44.426 | 24.9439 | 38.2978 | 42.1731 | -0.7633 | 18.619 |
0.9888 | 5.0 | 1335 | 0.9201 | 45.0385 | 25.524 | 38.8681 | 42.8998 | -0.7497 | 18.523 |
0.9623 | 6.0 | 1602 | 0.9119 | 44.8648 | 25.469 | 38.9281 | 42.7798 | -0.7496 | 18.537 |
0.9502 | 7.0 | 1869 | 0.9015 | 44.9668 | 25.5041 | 38.9463 | 42.9368 | -0.7412 | 18.48 |
0.9316 | 8.0 | 2136 | 0.8973 | 45.3028 | 25.7232 | 39.1533 | 43.277 | -0.7318 | 18.523 |
0.9191 | 9.0 | 2403 | 0.8921 | 45.2901 | 25.916 | 39.2909 | 43.3022 | -0.7296 | 18.529 |
0.9122 | 10.0 | 2670 | 0.8889 | 45.3535 | 26.1369 | 39.4861 | 43.28 | -0.7271 | 18.545 |
0.8993 | 11.0 | 2937 | 0.8857 | 45.5345 | 26.1669 | 39.5656 | 43.4664 | -0.7269 | 18.474 |
0.8905 | 12.0 | 3204 | 0.8816 | 45.7796 | 26.4145 | 39.8117 | 43.734 | -0.7185 | 18.503 |
0.8821 | 13.0 | 3471 | 0.8794 | 45.7163 | 26.4314 | 39.719 | 43.6407 | -0.7211 | 18.496 |
0.8789 | 14.0 | 3738 | 0.8784 | 45.9097 | 26.7281 | 39.9071 | 43.8105 | -0.7127 | 18.452 |
0.8665 | 15.0 | 4005 | 0.8765 | 46.1148 | 26.8882 | 40.1006 | 43.988 | -0.711 | 18.443 |
0.8676 | 16.0 | 4272 | 0.8766 | 45.9119 | 26.7674 | 39.9001 | 43.8237 | -0.718 | 18.491 |
0.8637 | 17.0 | 4539 | 0.8758 | 45.9158 | 26.7153 | 39.9463 | 43.8323 | -0.7183 | 18.492 |
0.8622 | 18.0 | 4806 | 0.8752 | 45.9508 | 26.75 | 39.9533 | 43.8795 | -0.7144 | 18.465 |
0.8588 | 19.0 | 5073 | 0.8744 | 45.9192 | 26.7352 | 39.8921 | 43.8204 | -0.7148 | 18.462 |
0.8554 | 20.0 | 5340 | 0.8743 | 45.9597 | 26.8086 | 39.935 | 43.8897 | -0.7132 | 18.464 |
Framework versions
- Transformers 4.21.2
- Pytorch 1.12.1+cu113
- Datasets 2.4.0
- Tokenizers 0.12.1
- Downloads last month
- 12
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.