metadata

license: apache-2.0
base_model: google/flan-t5-base
tags:
  - generated_from_trainer
metrics:
  - rouge
model-index:
  - name: t5-summarization-zero-shot-headers-and-better-prompt-base-enriched
    results: []

t5-summarization-zero-shot-headers-and-better-prompt-base-enriched

This model is a fine-tuned version of google/flan-t5-base on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.3582
Rouge: {'rouge1': 0.3973, 'rouge2': 0.1803, 'rougeL': 0.1995, 'rougeLsum': 0.1995}
Bert Score: 0.8772
Bleurt 20: -0.7678
Gen Len: 13.355

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge	Bert Score	Bleurt 20	Gen Len
2.188	1.0	601	2.1003	{'rouge1': 0.4472, 'rouge2': 0.1969, 'rougeL': 0.1958, 'rougeLsum': 0.1958}	0.8766	-0.805	14.265
1.8197	2.0	1202	1.9668	{'rouge1': 0.4259, 'rouge2': 0.1977, 'rougeL': 0.2091, 'rougeLsum': 0.2091}	0.8803	-0.7854	13.395
1.616	3.0	1803	1.9279	{'rouge1': 0.4209, 'rouge2': 0.1984, 'rougeL': 0.2069, 'rougeLsum': 0.2069}	0.8788	-0.7915	13.385
1.4174	4.0	2404	1.9601	{'rouge1': 0.4294, 'rouge2': 0.2009, 'rougeL': 0.2098, 'rougeLsum': 0.2098}	0.8796	-0.7453	13.745
1.2073	5.0	3005	1.9690	{'rouge1': 0.3801, 'rouge2': 0.1813, 'rougeL': 0.2045, 'rougeLsum': 0.2045}	0.8793	-0.8024	12.63
0.978	6.0	3606	2.1024	{'rouge1': 0.4035, 'rouge2': 0.1887, 'rougeL': 0.2067, 'rougeLsum': 0.2067}	0.8802	-0.7427	13.08
0.8994	7.0	4207	2.1300	{'rouge1': 0.4209, 'rouge2': 0.1962, 'rougeL': 0.2063, 'rougeLsum': 0.2063}	0.8821	-0.7351	13.315
0.8133	8.0	4808	2.2183	{'rouge1': 0.4053, 'rouge2': 0.1857, 'rougeL': 0.2083, 'rougeLsum': 0.2083}	0.8822	-0.7597	13.105
0.6993	9.0	5409	2.3794	{'rouge1': 0.4158, 'rouge2': 0.1926, 'rougeL': 0.2056, 'rougeLsum': 0.2056}	0.8789	-0.762	13.73
0.7033	10.0	6010	2.4450	{'rouge1': 0.4119, 'rouge2': 0.1928, 'rougeL': 0.2059, 'rougeLsum': 0.2059}	0.8804	-0.7611	13.165
0.5367	11.0	6611	2.6166	{'rouge1': 0.3886, 'rouge2': 0.1776, 'rougeL': 0.1961, 'rougeLsum': 0.1961}	0.8795	-0.8055	12.925
0.538	12.0	7212	2.6617	{'rouge1': 0.3971, 'rouge2': 0.1762, 'rougeL': 0.1942, 'rougeLsum': 0.1942}	0.878	-0.7797	13.135
0.5359	13.0	7813	2.8059	{'rouge1': 0.4188, 'rouge2': 0.2008, 'rougeL': 0.209, 'rougeLsum': 0.209}	0.8808	-0.7481	13.445
0.4019	14.0	8414	3.0293	{'rouge1': 0.3901, 'rouge2': 0.1723, 'rougeL': 0.1972, 'rougeLsum': 0.1972}	0.8765	-0.7554	13.135
0.3585	15.0	9015	3.0459	{'rouge1': 0.405, 'rouge2': 0.1843, 'rougeL': 0.2023, 'rougeLsum': 0.2023}	0.8789	-0.7381	13.38
0.3966	16.0	9616	3.0934	{'rouge1': 0.392, 'rouge2': 0.176, 'rougeL': 0.1879, 'rougeLsum': 0.1879}	0.8763	-0.7684	13.18
0.331	17.0	10217	3.1878	{'rouge1': 0.406, 'rouge2': 0.1828, 'rougeL': 0.1975, 'rougeLsum': 0.1975}	0.8771	-0.7609	13.47
0.3703	18.0	10818	3.2429	{'rouge1': 0.4032, 'rouge2': 0.1798, 'rougeL': 0.197, 'rougeLsum': 0.197}	0.8773	-0.7613	13.465
0.2751	19.0	11419	3.3337	{'rouge1': 0.3983, 'rouge2': 0.1772, 'rougeL': 0.2009, 'rougeLsum': 0.2009}	0.8778	-0.7595	13.38
0.2926	20.0	12020	3.3582	{'rouge1': 0.3973, 'rouge2': 0.1803, 'rougeL': 0.1995, 'rougeLsum': 0.1995}	0.8772	-0.7678	13.355

Framework versions

Transformers 4.35.2
Pytorch 2.1.0+cu121
Datasets 2.16.1
Tokenizers 0.15.0