distilgpt2-finetuned-wikitext2-agu

This model is a fine-tuned version of distilgpt2 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
3.7357	1.0	13655	3.6781
3.5721	2.0	27310	3.5302
3.4961	3.0	40965	3.4658
3.4406	4.0	54620	3.4242
3.4043	5.0	68275	3.3943
3.3789	6.0	81930	3.3726
3.3576	7.0	95585	3.3538
3.3389	8.0	109240	3.3389
3.3151	9.0	122895	3.3270
3.314	5.0	136545	3.3226
3.3044	6.0	163854	3.3124
3.2931	7.0	191163	3.3078
3.2874	8.0	218472	3.3094
3.2817	9.0	245781	3.2943
3.269	10.0	273090	3.2785
3.2423	11.0	300399	3.2651
3.2253	12.0	327708	3.2530
3.2096	13.0	355017	3.2435
3.1939	14.0	382326	3.2326
3.1786	15.0	409635	3.2225
3.1625	16.0	436944	3.2198
3.1619	17.0	464253	3.2180
3.1521	18.0	491562	3.2164
3.1555	19.0	518871	3.2152
3.1523	20.0	546180	3.2164
3.1639	21.0	573489	3.2133
3.1483	22.0	600798	3.2113
3.1497	23.0	628107	3.2077
3.1468	24.0	655416	3.2066
3.1461	25.0	682725	3.2052
3.1391	26.0	710034	3.2039
3.1384	27.0	737343	3.2031
3.135	28.0	764652	3.2020
3.1262	29.0	791961	3.2015
3.1357	30.0	819270	3.2019
3.1372	31.0	846579	3.2003
3.1346	32.0	873888	3.1988
3.134	33.0	901197	3.1975
3.1256	34.0	928506	3.1965
3.1261	35.0	955815	3.1950
3.1255	36.0	983124	3.1945
3.1278	37.0	1010433	3.1940
3.1186	38.0	1037742	3.1934
3.1136	39.0	1065051	3.1932
3.12	40.0	1092360	3.1931
3.12	41.0	1119669	3.1930
3.1165	42.0	1146978	3.1914
3.1166	43.0	1174287	3.1900
3.1139	44.0	1201596	3.1892
3.1135	45.0	1228905	3.1885
3.1077	46.0	1256214	3.1881
3.1097	47.0	1283523	3.1873
3.1076	48.0	1310832	3.1872
3.102	49.0	1338141	3.1870
3.1086	50.0	1365450	3.1869