bart_samsum_v2 / README.md
mixtralyanis's picture
End of training
cf4eb16 verified
|
raw
history blame
5.16 kB
metadata
license: mit
base_model: facebook/bart-large-cnn
tags:
  - generated_from_trainer
model-index:
  - name: bart_samsum_v2
    results: []

bart_samsum_v2

This model is a fine-tuned version of facebook/bart-large-cnn on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0236

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 8
  • num_epochs: 15

Training results

Training Loss Epoch Step Validation Loss
9.4233 0.17 1 9.1990
9.5213 0.34 2 8.5394
8.7467 0.52 3 8.1115
8.4697 0.69 4 7.5747
7.752 0.86 5 6.8712
7.0515 1.03 6 5.8670
6.0874 1.2 7 4.6814
5.0408 1.38 8 3.8055
4.14 1.55 9 2.6678
2.9893 1.72 10 1.9701
2.4337 1.89 11 1.5191
1.9451 2.06 12 1.2105
1.53 2.24 13 0.9714
1.2369 2.41 14 0.7905
1.0014 2.58 15 0.6478
0.8419 2.75 16 0.5493
0.7338 2.92 17 0.4770
0.6393 3.1 18 0.4151
0.5747 3.27 19 0.3691
0.4962 3.44 20 0.3293
0.4516 3.61 21 0.2935
0.3995 3.78 22 0.2614
0.3618 3.96 23 0.2346
0.3246 4.13 24 0.2129
0.2929 4.3 25 0.1938
0.278 4.47 26 0.1770
0.2493 4.65 27 0.1627
0.2273 4.82 28 0.1500
0.2067 4.99 29 0.1381
0.1917 5.16 30 0.1274
0.1805 5.33 31 0.1174
0.1557 5.51 32 0.1081
0.1495 5.68 33 0.1002
0.1394 5.85 34 0.0933
0.1261 6.02 35 0.0868
0.1155 6.19 36 0.0809
0.1114 6.37 37 0.0755
0.1041 6.54 38 0.0705
0.0952 6.71 39 0.0657
0.0881 6.88 40 0.0615
0.0823 7.05 41 0.0577
0.0778 7.23 42 0.0545
0.071 7.4 43 0.0515
0.07 7.57 44 0.0487
0.0625 7.74 45 0.0463
0.0589 7.91 46 0.0440
0.0567 8.09 47 0.0422
0.0537 8.26 48 0.0411
0.05 8.43 49 0.0398
0.0472 8.6 50 0.0384
0.0458 8.77 51 0.0363
0.0455 8.95 52 0.0347
0.0412 9.12 53 0.0340
0.0414 9.29 54 0.0326
0.0403 9.46 55 0.0333
0.0384 9.63 56 0.0303
0.0353 9.81 57 0.0298
0.0348 9.98 58 0.0293
0.0342 10.15 59 0.0275
0.0311 10.32 60 0.0272
0.0317 10.49 61 0.0270
0.0315 10.67 62 0.0261
0.0289 10.84 63 0.0253
0.0285 11.01 64 0.0247
0.0273 11.18 65 0.0244
0.0277 11.35 66 0.0240
0.0267 11.53 67 0.0237
0.0263 11.7 68 0.0237
0.0258 11.87 69 0.0237
0.0254 12.04 70 0.0238
0.0248 12.22 71 0.0239
0.0246 12.39 72 0.0239
0.0249 12.56 73 0.0237
0.0239 12.73 74 0.0236
0.0247 12.9 75 0.0236

Framework versions

  • Transformers 4.38.1
  • Pytorch 2.1.0+cu121
  • Datasets 2.17.1
  • Tokenizers 0.15.2