metadata

tags:
  - generated_from_trainer
model-index:
  - name: baseline
    results: []

baseline

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.8173
Exact Match: 0.024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 100
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 400
optimizer: Adam with betas=(0.98,0.999) and epsilon=1e-08
lr_scheduler_type: inverse_sqrt
lr_scheduler_warmup_steps: 4000
num_epochs: 100
label_smoothing_factor: 0.1

Training results

Training Loss	Epoch	Step	Validation Loss	Exact Match
6.2658	1.0	25	5.8355	0.0
5.8196	2.0	50	4.9580	0.0
5.1965	3.0	75	4.4378	0.0
4.7235	4.0	100	4.0961	0.0
4.4214	5.0	125	3.9076	0.0
4.2363	6.0	150	3.7785	0.0
4.1005	7.0	175	3.6651	0.0
3.9843	8.0	200	3.5463	0.0
3.8646	9.0	225	3.4428	0.0
3.741	10.0	250	3.3358	0.0
3.6147	11.0	275	3.2427	0.0
3.495	12.0	300	3.1311	0.0
3.3903	13.0	325	3.0348	0.0
3.2919	14.0	350	2.9669	0.0
3.2062	15.0	375	2.9071	0.0
3.137	16.0	400	2.8355	0.0
3.0572	17.0	425	2.7804	0.0
2.9925	18.0	450	2.7326	0.0
2.9289	19.0	475	2.6681	0.0
2.8768	20.0	500	2.6168	0.0
2.8239	21.0	525	2.5649	0.0
2.7689	22.0	550	2.5171	0.0
2.7125	23.0	575	2.4704	0.0
2.6607	24.0	600	2.4268	0.0
2.6149	25.0	625	2.4030	0.0
2.56	26.0	650	2.3599	0.001
2.5085	27.0	675	2.3096	0.006
2.4572	28.0	700	2.2606	0.008
2.4133	29.0	725	2.2313	0.009
2.3613	30.0	750	2.2011	0.011
2.3141	31.0	775	2.1757	0.011
2.2714	32.0	800	2.1362	0.013
2.2226	33.0	825	2.1176	0.019
2.1821	34.0	850	2.0885	0.022
2.1445	35.0	875	2.0810	0.022
2.1082	36.0	900	2.0501	0.034
2.0683	37.0	925	2.0571	0.029
2.0365	38.0	950	2.0318	0.028
2.0003	39.0	975	2.0227	0.033
1.9654	40.0	1000	2.0141	0.042
1.9358	41.0	1025	2.0042	0.045
1.905	42.0	1050	1.9903	0.055
1.878	43.0	1075	2.0076	0.057
1.8517	44.0	1100	1.9761	0.057
1.8233	45.0	1125	1.9952	0.063
1.7934	46.0	1150	1.9562	0.062
1.7636	47.0	1175	1.9776	0.068
1.7445	48.0	1200	1.9503	0.066
1.7226	49.0	1225	1.9616	0.065
1.7013	50.0	1250	1.9516	0.067
1.68	51.0	1275	1.9408	0.065
1.6557	52.0	1300	1.9493	0.073
1.6396	53.0	1325	1.9012	0.099
1.6183	54.0	1350	1.9144	0.083
1.5918	55.0	1375	1.9150	0.085
1.5788	56.0	1400	1.9278	0.098
1.5601	57.0	1425	1.9072	0.088
1.5464	58.0	1450	1.8896	0.084
1.5333	59.0	1475	1.9001	0.111
1.5153	60.0	1500	1.8746	0.089
1.5034	61.0	1525	1.8869	0.089
1.4876	62.0	1550	1.8744	0.105
1.4779	63.0	1575	1.8866	0.089
1.459	64.0	1600	1.8615	0.128
1.4447	65.0	1625	1.8565	0.111
1.4321	66.0	1650	1.8659	0.129
1.4175	67.0	1675	1.8571	0.121
1.4071	68.0	1700	1.8831	0.076
1.3987	69.0	1725	1.8492	0.077
1.3837	70.0	1750	1.8430	0.101
1.374	71.0	1775	1.8455	0.082
1.3645	72.0	1800	1.8506	0.064
1.3537	73.0	1825	1.8345	0.07
1.3441	74.0	1850	1.8267	0.115
1.3348	75.0	1875	1.8504	0.084
1.3205	76.0	1900	1.8470	0.08
1.3108	77.0	1925	1.8397	0.089
1.3028	78.0	1950	1.8657	0.073
1.2978	79.0	1975	1.8595	0.067
1.2875	80.0	2000	1.8322	0.073
1.2753	81.0	2025	1.8697	0.04
1.2707	82.0	2050	1.8426	0.085
1.262	83.0	2075	1.8229	0.093
1.2532	84.0	2100	1.8420	0.054
1.2488	85.0	2125	1.8465	0.057
1.2373	86.0	2150	1.8701	0.051
1.229	87.0	2175	1.8474	0.054
1.2242	88.0	2200	1.8287	0.072
1.2195	89.0	2225	1.8424	0.057
1.212	90.0	2250	1.8663	0.062
1.2033	91.0	2275	1.8447	0.044
1.1994	92.0	2300	1.8287	0.058
1.1907	93.0	2325	1.8425	0.05
1.1847	94.0	2350	1.8993	0.004
1.1805	95.0	2375	1.9007	0.014
1.1739	96.0	2400	1.8792	0.015
1.1697	97.0	2425	1.8315	0.02
1.162	98.0	2450	1.8024	0.043
1.1585	99.0	2475	1.8437	0.017
1.1505	100.0	2500	1.8173	0.024

Framework versions

Transformers 4.35.2
Pytorch 2.1.0+cu118
Datasets 2.15.0
Tokenizers 0.15.0