baseline / README.md
Yova's picture
End of training
696c7be
|
raw
history blame
7.82 kB
metadata
tags:
  - generated_from_trainer
model-index:
  - name: baseline
    results: []

baseline

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8173
  • Exact Match: 0.024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 100
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 400
  • optimizer: Adam with betas=(0.98,0.999) and epsilon=1e-08
  • lr_scheduler_type: inverse_sqrt
  • lr_scheduler_warmup_steps: 4000
  • num_epochs: 100
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss Exact Match
6.2658 1.0 25 5.8355 0.0
5.8196 2.0 50 4.9580 0.0
5.1965 3.0 75 4.4378 0.0
4.7235 4.0 100 4.0961 0.0
4.4214 5.0 125 3.9076 0.0
4.2363 6.0 150 3.7785 0.0
4.1005 7.0 175 3.6651 0.0
3.9843 8.0 200 3.5463 0.0
3.8646 9.0 225 3.4428 0.0
3.741 10.0 250 3.3358 0.0
3.6147 11.0 275 3.2427 0.0
3.495 12.0 300 3.1311 0.0
3.3903 13.0 325 3.0348 0.0
3.2919 14.0 350 2.9669 0.0
3.2062 15.0 375 2.9071 0.0
3.137 16.0 400 2.8355 0.0
3.0572 17.0 425 2.7804 0.0
2.9925 18.0 450 2.7326 0.0
2.9289 19.0 475 2.6681 0.0
2.8768 20.0 500 2.6168 0.0
2.8239 21.0 525 2.5649 0.0
2.7689 22.0 550 2.5171 0.0
2.7125 23.0 575 2.4704 0.0
2.6607 24.0 600 2.4268 0.0
2.6149 25.0 625 2.4030 0.0
2.56 26.0 650 2.3599 0.001
2.5085 27.0 675 2.3096 0.006
2.4572 28.0 700 2.2606 0.008
2.4133 29.0 725 2.2313 0.009
2.3613 30.0 750 2.2011 0.011
2.3141 31.0 775 2.1757 0.011
2.2714 32.0 800 2.1362 0.013
2.2226 33.0 825 2.1176 0.019
2.1821 34.0 850 2.0885 0.022
2.1445 35.0 875 2.0810 0.022
2.1082 36.0 900 2.0501 0.034
2.0683 37.0 925 2.0571 0.029
2.0365 38.0 950 2.0318 0.028
2.0003 39.0 975 2.0227 0.033
1.9654 40.0 1000 2.0141 0.042
1.9358 41.0 1025 2.0042 0.045
1.905 42.0 1050 1.9903 0.055
1.878 43.0 1075 2.0076 0.057
1.8517 44.0 1100 1.9761 0.057
1.8233 45.0 1125 1.9952 0.063
1.7934 46.0 1150 1.9562 0.062
1.7636 47.0 1175 1.9776 0.068
1.7445 48.0 1200 1.9503 0.066
1.7226 49.0 1225 1.9616 0.065
1.7013 50.0 1250 1.9516 0.067
1.68 51.0 1275 1.9408 0.065
1.6557 52.0 1300 1.9493 0.073
1.6396 53.0 1325 1.9012 0.099
1.6183 54.0 1350 1.9144 0.083
1.5918 55.0 1375 1.9150 0.085
1.5788 56.0 1400 1.9278 0.098
1.5601 57.0 1425 1.9072 0.088
1.5464 58.0 1450 1.8896 0.084
1.5333 59.0 1475 1.9001 0.111
1.5153 60.0 1500 1.8746 0.089
1.5034 61.0 1525 1.8869 0.089
1.4876 62.0 1550 1.8744 0.105
1.4779 63.0 1575 1.8866 0.089
1.459 64.0 1600 1.8615 0.128
1.4447 65.0 1625 1.8565 0.111
1.4321 66.0 1650 1.8659 0.129
1.4175 67.0 1675 1.8571 0.121
1.4071 68.0 1700 1.8831 0.076
1.3987 69.0 1725 1.8492 0.077
1.3837 70.0 1750 1.8430 0.101
1.374 71.0 1775 1.8455 0.082
1.3645 72.0 1800 1.8506 0.064
1.3537 73.0 1825 1.8345 0.07
1.3441 74.0 1850 1.8267 0.115
1.3348 75.0 1875 1.8504 0.084
1.3205 76.0 1900 1.8470 0.08
1.3108 77.0 1925 1.8397 0.089
1.3028 78.0 1950 1.8657 0.073
1.2978 79.0 1975 1.8595 0.067
1.2875 80.0 2000 1.8322 0.073
1.2753 81.0 2025 1.8697 0.04
1.2707 82.0 2050 1.8426 0.085
1.262 83.0 2075 1.8229 0.093
1.2532 84.0 2100 1.8420 0.054
1.2488 85.0 2125 1.8465 0.057
1.2373 86.0 2150 1.8701 0.051
1.229 87.0 2175 1.8474 0.054
1.2242 88.0 2200 1.8287 0.072
1.2195 89.0 2225 1.8424 0.057
1.212 90.0 2250 1.8663 0.062
1.2033 91.0 2275 1.8447 0.044
1.1994 92.0 2300 1.8287 0.058
1.1907 93.0 2325 1.8425 0.05
1.1847 94.0 2350 1.8993 0.004
1.1805 95.0 2375 1.9007 0.014
1.1739 96.0 2400 1.8792 0.015
1.1697 97.0 2425 1.8315 0.02
1.162 98.0 2450 1.8024 0.043
1.1585 99.0 2475 1.8437 0.017
1.1505 100.0 2500 1.8173 0.024

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.15.0
  • Tokenizers 0.15.0