mitre-gpt2-base

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.3404

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
3.2933 2.72 1000 2.6028
2.2619 5.45 2000 2.4654
1.9152 8.17 3000 2.3952
1.6434 10.9 4000 2.3729
1.4289 13.62 5000 2.4208
1.2627 16.35 6000 2.4845
1.1301 19.07 7000 2.5619
1.0169 21.8 8000 2.6058
0.93 24.52 9000 2.6773
0.8587 27.25 10000 2.7389
0.8032 29.97 11000 2.7639
0.7506 32.7 12000 2.8329
0.7079 35.42 13000 2.8934
0.6781 38.15 14000 2.9175
0.6461 40.87 15000 2.9532
0.6205 43.6 16000 3.0008
0.5987 46.32 17000 3.0539
0.5811 49.05 18000 3.0738
0.564 51.77 19000 3.0972
0.5491 54.5 20000 3.1341
0.5377 57.22 21000 3.1558
0.5255 59.95 22000 3.1723
0.516 62.67 23000 3.1984
0.5077 65.4 24000 3.2163
0.5021 68.12 25000 3.2396
0.4946 70.84 26000 3.2413
0.4871 73.57 27000 3.2708
0.4845 76.29 28000 3.2833
0.4791 79.02 29000 3.2847
0.4739 81.74 30000 3.2950
0.4704 84.47 31000 3.3124
0.4678 87.19 32000 3.3122
0.4642 89.92 33000 3.3260
0.4617 92.64 34000 3.3326
0.4605 95.37 35000 3.3325
0.4576 98.09 36000 3.3404

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
5
Safetensors
Model size
125M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for bencyc1129/mitre-gpt2-base

Finetuned
(1331)
this model