Edit model card

flan-xl-gen5

This model is a fine-tuned version of ybelkada/flan-t5-xl-sharded-bf16 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6594
  • Rouge1: 34.2696
  • Rouge2: 25.7973
  • Rougel: 30.5609
  • Rougelsum: 30.9651
  • Gen Len: 10.5326

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 12

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
No log 1.0 328 8.0966 16.5418 10.3523 13.972 14.1918 15.5773
18.3143 2.0 656 0.9260 31.5806 27.0287 29.6916 30.0327 8.8969
18.3143 3.0 984 0.7708 22.6847 15.805 19.6336 19.8945 13.8076
1.0739 4.0 1312 0.7308 35.1675 27.3998 31.8527 32.0356 9.6186
0.8085 5.0 1640 0.7084 34.4346 26.202 30.8999 31.212 10.1168
0.8085 6.0 1968 0.6924 34.3345 26.0144 30.692 31.0384 10.2680
0.7597 7.0 2296 0.6813 34.3854 26.0495 30.8335 31.1696 10.3196
0.7442 8.0 2624 0.6729 34.3758 26.0079 30.7863 31.1239 10.3608
0.7442 9.0 2952 0.6670 34.2115 25.7443 30.5369 30.9282 10.4983
0.7252 10.0 3280 0.6625 34.2518 25.7147 30.5433 30.9116 10.5292
0.7168 11.0 3608 0.6601 34.0539 25.5073 30.329 30.6828 10.6186
0.7168 12.0 3936 0.6594 34.2696 25.7973 30.5609 30.9651 10.5326

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
2
Safetensors
Model size
2.85B params
Tensor type
F32
FP16
I8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for devvanshhh/flan-xl-gen5

Quantized
(2)
this model