---
license: apache-2.0
base_model: google/flan-t5-base
tags:
- generated_from_trainer
metrics:
- rouge
model-index:
- name: t5-summarization-zero-shot-headers-and-better-prompt-base-enriched
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# t5-summarization-zero-shot-headers-and-better-prompt-base-enriched

This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co./google/flan-t5-base) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 3.3582
- Rouge: {'rouge1': 0.3973, 'rouge2': 0.1803, 'rougeL': 0.1995, 'rougeLsum': 0.1995}
- Bert Score: 0.8772
- Bleurt 20: -0.7678
- Gen Len: 13.355

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 20

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Rouge                                                                       | Bert Score | Bleurt 20 | Gen Len |
|:-------------:|:-----:|:-----:|:---------------:|:---------------------------------------------------------------------------:|:----------:|:---------:|:-------:|
| 2.188         | 1.0   | 601   | 2.1003          | {'rouge1': 0.4472, 'rouge2': 0.1969, 'rougeL': 0.1958, 'rougeLsum': 0.1958} | 0.8766     | -0.805    | 14.265  |
| 1.8197        | 2.0   | 1202  | 1.9668          | {'rouge1': 0.4259, 'rouge2': 0.1977, 'rougeL': 0.2091, 'rougeLsum': 0.2091} | 0.8803     | -0.7854   | 13.395  |
| 1.616         | 3.0   | 1803  | 1.9279          | {'rouge1': 0.4209, 'rouge2': 0.1984, 'rougeL': 0.2069, 'rougeLsum': 0.2069} | 0.8788     | -0.7915   | 13.385  |
| 1.4174        | 4.0   | 2404  | 1.9601          | {'rouge1': 0.4294, 'rouge2': 0.2009, 'rougeL': 0.2098, 'rougeLsum': 0.2098} | 0.8796     | -0.7453   | 13.745  |
| 1.2073        | 5.0   | 3005  | 1.9690          | {'rouge1': 0.3801, 'rouge2': 0.1813, 'rougeL': 0.2045, 'rougeLsum': 0.2045} | 0.8793     | -0.8024   | 12.63   |
| 0.978         | 6.0   | 3606  | 2.1024          | {'rouge1': 0.4035, 'rouge2': 0.1887, 'rougeL': 0.2067, 'rougeLsum': 0.2067} | 0.8802     | -0.7427   | 13.08   |
| 0.8994        | 7.0   | 4207  | 2.1300          | {'rouge1': 0.4209, 'rouge2': 0.1962, 'rougeL': 0.2063, 'rougeLsum': 0.2063} | 0.8821     | -0.7351   | 13.315  |
| 0.8133        | 8.0   | 4808  | 2.2183          | {'rouge1': 0.4053, 'rouge2': 0.1857, 'rougeL': 0.2083, 'rougeLsum': 0.2083} | 0.8822     | -0.7597   | 13.105  |
| 0.6993        | 9.0   | 5409  | 2.3794          | {'rouge1': 0.4158, 'rouge2': 0.1926, 'rougeL': 0.2056, 'rougeLsum': 0.2056} | 0.8789     | -0.762    | 13.73   |
| 0.7033        | 10.0  | 6010  | 2.4450          | {'rouge1': 0.4119, 'rouge2': 0.1928, 'rougeL': 0.2059, 'rougeLsum': 0.2059} | 0.8804     | -0.7611   | 13.165  |
| 0.5367        | 11.0  | 6611  | 2.6166          | {'rouge1': 0.3886, 'rouge2': 0.1776, 'rougeL': 0.1961, 'rougeLsum': 0.1961} | 0.8795     | -0.8055   | 12.925  |
| 0.538         | 12.0  | 7212  | 2.6617          | {'rouge1': 0.3971, 'rouge2': 0.1762, 'rougeL': 0.1942, 'rougeLsum': 0.1942} | 0.878      | -0.7797   | 13.135  |
| 0.5359        | 13.0  | 7813  | 2.8059          | {'rouge1': 0.4188, 'rouge2': 0.2008, 'rougeL': 0.209, 'rougeLsum': 0.209}   | 0.8808     | -0.7481   | 13.445  |
| 0.4019        | 14.0  | 8414  | 3.0293          | {'rouge1': 0.3901, 'rouge2': 0.1723, 'rougeL': 0.1972, 'rougeLsum': 0.1972} | 0.8765     | -0.7554   | 13.135  |
| 0.3585        | 15.0  | 9015  | 3.0459          | {'rouge1': 0.405, 'rouge2': 0.1843, 'rougeL': 0.2023, 'rougeLsum': 0.2023}  | 0.8789     | -0.7381   | 13.38   |
| 0.3966        | 16.0  | 9616  | 3.0934          | {'rouge1': 0.392, 'rouge2': 0.176, 'rougeL': 0.1879, 'rougeLsum': 0.1879}   | 0.8763     | -0.7684   | 13.18   |
| 0.331         | 17.0  | 10217 | 3.1878          | {'rouge1': 0.406, 'rouge2': 0.1828, 'rougeL': 0.1975, 'rougeLsum': 0.1975}  | 0.8771     | -0.7609   | 13.47   |
| 0.3703        | 18.0  | 10818 | 3.2429          | {'rouge1': 0.4032, 'rouge2': 0.1798, 'rougeL': 0.197, 'rougeLsum': 0.197}   | 0.8773     | -0.7613   | 13.465  |
| 0.2751        | 19.0  | 11419 | 3.3337          | {'rouge1': 0.3983, 'rouge2': 0.1772, 'rougeL': 0.2009, 'rougeLsum': 0.2009} | 0.8778     | -0.7595   | 13.38   |
| 0.2926        | 20.0  | 12020 | 3.3582          | {'rouge1': 0.3973, 'rouge2': 0.1803, 'rougeL': 0.1995, 'rougeLsum': 0.1995} | 0.8772     | -0.7678   | 13.355  |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0