add model

4089af7 about 2 years ago

4.2 kB

	---
	license: apache-2.0
	tags:
	- generated_from_trainer
	model-index:
	- name: distilgpt2-finetuned-wikitext2-agu
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# distilgpt2-finetuned-wikitext2-agu

	This model is a fine-tuned version of [distilgpt2](https://huggingface.co./distilgpt2) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.1869

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 50

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-------:\|:---------------:\|
	\| 3.7357 \| 1.0 \| 13655 \| 3.6781 \|
	\| 3.5721 \| 2.0 \| 27310 \| 3.5302 \|
	\| 3.4961 \| 3.0 \| 40965 \| 3.4658 \|
	\| 3.4406 \| 4.0 \| 54620 \| 3.4242 \|
	\| 3.4043 \| 5.0 \| 68275 \| 3.3943 \|
	\| 3.3789 \| 6.0 \| 81930 \| 3.3726 \|
	\| 3.3576 \| 7.0 \| 95585 \| 3.3538 \|
	\| 3.3389 \| 8.0 \| 109240 \| 3.3389 \|
	\| 3.3151 \| 9.0 \| 122895 \| 3.3270 \|
	\| 3.314 \| 5.0 \| 136545 \| 3.3226 \|
	\| 3.3044 \| 6.0 \| 163854 \| 3.3124 \|
	\| 3.2931 \| 7.0 \| 191163 \| 3.3078 \|
	\| 3.2874 \| 8.0 \| 218472 \| 3.3094 \|
	\| 3.2817 \| 9.0 \| 245781 \| 3.2943 \|
	\| 3.269 \| 10.0 \| 273090 \| 3.2785 \|
	\| 3.2423 \| 11.0 \| 300399 \| 3.2651 \|
	\| 3.2253 \| 12.0 \| 327708 \| 3.2530 \|
	\| 3.2096 \| 13.0 \| 355017 \| 3.2435 \|
	\| 3.1939 \| 14.0 \| 382326 \| 3.2326 \|
	\| 3.1786 \| 15.0 \| 409635 \| 3.2225 \|
	\| 3.1625 \| 16.0 \| 436944 \| 3.2198 \|
	\| 3.1619 \| 17.0 \| 464253 \| 3.2180 \|
	\| 3.1521 \| 18.0 \| 491562 \| 3.2164 \|
	\| 3.1555 \| 19.0 \| 518871 \| 3.2152 \|
	\| 3.1523 \| 20.0 \| 546180 \| 3.2164 \|
	\| 3.1639 \| 21.0 \| 573489 \| 3.2133 \|
	\| 3.1483 \| 22.0 \| 600798 \| 3.2113 \|
	\| 3.1497 \| 23.0 \| 628107 \| 3.2077 \|
	\| 3.1468 \| 24.0 \| 655416 \| 3.2066 \|
	\| 3.1461 \| 25.0 \| 682725 \| 3.2052 \|
	\| 3.1391 \| 26.0 \| 710034 \| 3.2039 \|
	\| 3.1384 \| 27.0 \| 737343 \| 3.2031 \|
	\| 3.135 \| 28.0 \| 764652 \| 3.2020 \|
	\| 3.1262 \| 29.0 \| 791961 \| 3.2015 \|
	\| 3.1357 \| 30.0 \| 819270 \| 3.2019 \|
	\| 3.1372 \| 31.0 \| 846579 \| 3.2003 \|
	\| 3.1346 \| 32.0 \| 873888 \| 3.1988 \|
	\| 3.134 \| 33.0 \| 901197 \| 3.1975 \|
	\| 3.1256 \| 34.0 \| 928506 \| 3.1965 \|
	\| 3.1261 \| 35.0 \| 955815 \| 3.1950 \|
	\| 3.1255 \| 36.0 \| 983124 \| 3.1945 \|
	\| 3.1278 \| 37.0 \| 1010433 \| 3.1940 \|
	\| 3.1186 \| 38.0 \| 1037742 \| 3.1934 \|
	\| 3.1136 \| 39.0 \| 1065051 \| 3.1932 \|
	\| 3.12 \| 40.0 \| 1092360 \| 3.1931 \|
	\| 3.12 \| 41.0 \| 1119669 \| 3.1930 \|
	\| 3.1165 \| 42.0 \| 1146978 \| 3.1914 \|
	\| 3.1166 \| 43.0 \| 1174287 \| 3.1900 \|
	\| 3.1139 \| 44.0 \| 1201596 \| 3.1892 \|
	\| 3.1135 \| 45.0 \| 1228905 \| 3.1885 \|
	\| 3.1077 \| 46.0 \| 1256214 \| 3.1881 \|
	\| 3.1097 \| 47.0 \| 1283523 \| 3.1873 \|
	\| 3.1076 \| 48.0 \| 1310832 \| 3.1872 \|
	\| 3.102 \| 49.0 \| 1338141 \| 3.1870 \|
	\| 3.1086 \| 50.0 \| 1365450 \| 3.1869 \|


	### Framework versions

	- Transformers 4.18.0
	- Pytorch 1.9.0+cu111
	- Datasets 2.4.0
	- Tokenizers 0.12.1

	---
	license: apache-2.0
	tags:
	- generated_from_trainer
	model-index:
	- name: distilgpt2-finetuned-wikitext2-agu
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# distilgpt2-finetuned-wikitext2-agu

	This model is a fine-tuned version of [distilgpt2](https://huggingface.co./distilgpt2) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.1869

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 50

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-------:\|:---------------:\|
	\| 3.7357 \| 1.0 \| 13655 \| 3.6781 \|
	\| 3.5721 \| 2.0 \| 27310 \| 3.5302 \|
	\| 3.4961 \| 3.0 \| 40965 \| 3.4658 \|
	\| 3.4406 \| 4.0 \| 54620 \| 3.4242 \|
	\| 3.4043 \| 5.0 \| 68275 \| 3.3943 \|
	\| 3.3789 \| 6.0 \| 81930 \| 3.3726 \|
	\| 3.3576 \| 7.0 \| 95585 \| 3.3538 \|
	\| 3.3389 \| 8.0 \| 109240 \| 3.3389 \|
	\| 3.3151 \| 9.0 \| 122895 \| 3.3270 \|
	\| 3.314 \| 5.0 \| 136545 \| 3.3226 \|
	\| 3.3044 \| 6.0 \| 163854 \| 3.3124 \|
	\| 3.2931 \| 7.0 \| 191163 \| 3.3078 \|
	\| 3.2874 \| 8.0 \| 218472 \| 3.3094 \|
	\| 3.2817 \| 9.0 \| 245781 \| 3.2943 \|
	\| 3.269 \| 10.0 \| 273090 \| 3.2785 \|
	\| 3.2423 \| 11.0 \| 300399 \| 3.2651 \|
	\| 3.2253 \| 12.0 \| 327708 \| 3.2530 \|
	\| 3.2096 \| 13.0 \| 355017 \| 3.2435 \|
	\| 3.1939 \| 14.0 \| 382326 \| 3.2326 \|
	\| 3.1786 \| 15.0 \| 409635 \| 3.2225 \|
	\| 3.1625 \| 16.0 \| 436944 \| 3.2198 \|
	\| 3.1619 \| 17.0 \| 464253 \| 3.2180 \|
	\| 3.1521 \| 18.0 \| 491562 \| 3.2164 \|
	\| 3.1555 \| 19.0 \| 518871 \| 3.2152 \|
	\| 3.1523 \| 20.0 \| 546180 \| 3.2164 \|
	\| 3.1639 \| 21.0 \| 573489 \| 3.2133 \|
	\| 3.1483 \| 22.0 \| 600798 \| 3.2113 \|
	\| 3.1497 \| 23.0 \| 628107 \| 3.2077 \|
	\| 3.1468 \| 24.0 \| 655416 \| 3.2066 \|
	\| 3.1461 \| 25.0 \| 682725 \| 3.2052 \|
	\| 3.1391 \| 26.0 \| 710034 \| 3.2039 \|
	\| 3.1384 \| 27.0 \| 737343 \| 3.2031 \|
	\| 3.135 \| 28.0 \| 764652 \| 3.2020 \|
	\| 3.1262 \| 29.0 \| 791961 \| 3.2015 \|
	\| 3.1357 \| 30.0 \| 819270 \| 3.2019 \|
	\| 3.1372 \| 31.0 \| 846579 \| 3.2003 \|
	\| 3.1346 \| 32.0 \| 873888 \| 3.1988 \|
	\| 3.134 \| 33.0 \| 901197 \| 3.1975 \|
	\| 3.1256 \| 34.0 \| 928506 \| 3.1965 \|
	\| 3.1261 \| 35.0 \| 955815 \| 3.1950 \|
	\| 3.1255 \| 36.0 \| 983124 \| 3.1945 \|
	\| 3.1278 \| 37.0 \| 1010433 \| 3.1940 \|
	\| 3.1186 \| 38.0 \| 1037742 \| 3.1934 \|
	\| 3.1136 \| 39.0 \| 1065051 \| 3.1932 \|
	\| 3.12 \| 40.0 \| 1092360 \| 3.1931 \|
	\| 3.12 \| 41.0 \| 1119669 \| 3.1930 \|
	\| 3.1165 \| 42.0 \| 1146978 \| 3.1914 \|
	\| 3.1166 \| 43.0 \| 1174287 \| 3.1900 \|
	\| 3.1139 \| 44.0 \| 1201596 \| 3.1892 \|
	\| 3.1135 \| 45.0 \| 1228905 \| 3.1885 \|
	\| 3.1077 \| 46.0 \| 1256214 \| 3.1881 \|
	\| 3.1097 \| 47.0 \| 1283523 \| 3.1873 \|
	\| 3.1076 \| 48.0 \| 1310832 \| 3.1872 \|
	\| 3.102 \| 49.0 \| 1338141 \| 3.1870 \|
	\| 3.1086 \| 50.0 \| 1365450 \| 3.1869 \|


	### Framework versions

	- Transformers 4.18.0
	- Pytorch 1.9.0+cu111
	- Datasets 2.4.0
	- Tokenizers 0.12.1