File size: 4,809 Bytes
f5139c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
license: apache-2.0
base_model: google/mt5-base
tags:
- generated_from_trainer
metrics:
- rouge
- sacrebleu
model-index:
- name: mT5-TextSimp-LT-BatchSize2-lr1e-4
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mT5-TextSimp-LT-BatchSize2-lr1e-4

This model is a fine-tuned version of [google/mt5-base](https://huggingface.co./google/mt5-base) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0672
- Rouge1: 0.7548
- Rouge2: 0.5989
- Rougel: 0.7509
- Sacrebleu: 49.0373
- Gen Len: 38.0501

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 8

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Sacrebleu | Gen Len |
|:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|:-------:|
| 25.6783       | 0.24  | 200  | 16.0497         | 0.0109 | 0.0005 | 0.0107 | 0.0029    | 512.0   |
| 1.9593        | 0.48  | 400  | 0.7780          | 0.014  | 0.0005 | 0.0136 | 0.0146    | 42.685  |
| 0.2778        | 0.72  | 600  | 0.1429          | 0.4924 | 0.3128 | 0.4803 | 20.3057   | 38.0382 |
| 0.1325        | 0.96  | 800  | 0.1039          | 0.6193 | 0.4369 | 0.6098 | 33.687    | 38.0501 |
| 0.1702        | 1.2   | 1000 | 0.0958          | 0.6697 | 0.5016 | 0.6613 | 38.0391   | 38.0501 |
| 0.13          | 1.44  | 1200 | 0.0880          | 0.6737 | 0.5051 | 0.6644 | 38.62     | 38.0501 |
| 0.1086        | 1.67  | 1400 | 0.0839          | 0.6964 | 0.5326 | 0.6884 | 40.9056   | 38.0501 |
| 0.0716        | 1.91  | 1600 | 0.0859          | 0.6933 | 0.5298 | 0.6862 | 40.7158   | 38.0501 |
| 0.1135        | 2.15  | 1800 | 0.0820          | 0.7017 | 0.5366 | 0.6936 | 40.7484   | 38.0501 |
| 0.0997        | 2.39  | 2000 | 0.0814          | 0.7011 | 0.5351 | 0.6945 | 41.1948   | 38.0501 |
| 0.0996        | 2.63  | 2200 | 0.0774          | 0.7103 | 0.5522 | 0.7049 | 42.5756   | 38.0501 |
| 1.1379        | 2.87  | 2400 | 0.0763          | 0.7211 | 0.5556 | 0.7152 | 43.2411   | 38.0501 |
| 0.0594        | 3.11  | 2600 | 0.0776          | 0.7261 | 0.5647 | 0.7201 | 44.2205   | 38.0501 |
| 0.0763        | 3.35  | 2800 | 0.0736          | 0.7309 | 0.5709 | 0.7251 | 45.2825   | 38.0501 |
| 0.1641        | 3.59  | 3000 | 0.0722          | 0.7297 | 0.5685 | 0.7242 | 44.9001   | 38.0501 |
| 0.1085        | 3.83  | 3200 | 0.0703          | 0.7377 | 0.5793 | 0.7319 | 45.7504   | 38.0501 |
| 0.0573        | 4.07  | 3400 | 0.0719          | 0.7393 | 0.5796 | 0.7335 | 45.86     | 38.0501 |
| 0.1149        | 4.31  | 3600 | 0.0705          | 0.7415 | 0.5787 | 0.7365 | 46.2652   | 38.0501 |
| 0.0843        | 4.55  | 3800 | 0.0703          | 0.7385 | 0.5754 | 0.7326 | 46.5292   | 38.0501 |
| 0.0658        | 4.78  | 4000 | 0.0705          | 0.7437 | 0.5855 | 0.7384 | 46.864    | 38.0501 |
| 0.0676        | 5.02  | 4200 | 0.0694          | 0.7437 | 0.584  | 0.7384 | 47.1268   | 38.0501 |
| 0.0657        | 5.26  | 4400 | 0.0711          | 0.7473 | 0.5913 | 0.7432 | 47.4413   | 38.0501 |
| 0.0679        | 5.5   | 4600 | 0.0702          | 0.7496 | 0.5908 | 0.7446 | 47.8281   | 38.0501 |
| 0.0664        | 5.74  | 4800 | 0.0671          | 0.7511 | 0.5929 | 0.7463 | 47.7693   | 38.0501 |
| 0.0446        | 5.98  | 5000 | 0.0685          | 0.7533 | 0.5932 | 0.7478 | 48.032    | 38.0501 |
| 0.0732        | 6.22  | 5200 | 0.0678          | 0.7523 | 0.5948 | 0.7472 | 48.3467   | 38.0501 |
| 0.0706        | 6.46  | 5400 | 0.0672          | 0.755  | 0.5983 | 0.7507 | 48.6158   | 38.0501 |
| 0.051         | 6.7   | 5600 | 0.0674          | 0.7523 | 0.5961 | 0.7478 | 48.4828   | 38.0501 |
| 0.067         | 6.94  | 5800 | 0.0681          | 0.7532 | 0.5978 | 0.7492 | 48.7253   | 38.0501 |
| 0.075         | 7.18  | 6000 | 0.0684          | 0.7534 | 0.5969 | 0.7492 | 48.7053   | 38.0501 |
| 0.1323        | 7.42  | 6200 | 0.0671          | 0.755  | 0.5991 | 0.7511 | 48.9922   | 38.0501 |
| 0.0383        | 7.66  | 6400 | 0.0671          | 0.7551 | 0.5994 | 0.7511 | 49.0028   | 38.0501 |
| 0.0599        | 7.89  | 6600 | 0.0672          | 0.7548 | 0.5989 | 0.7509 | 49.0373   | 38.0501 |


### Framework versions

- Transformers 4.33.0
- Pytorch 2.1.2+cu121
- Datasets 2.14.4
- Tokenizers 0.13.3