xlnet-1 / README.md

Training in progress epoch 93

757907c about 1 year ago

7.17 kB

	---
	license: mit
	base_model: xlnet-large-cased
	tags:
	- generated_from_keras_callback
	model-index:
	- name: vedantjumle/xlnet-1
	results: []
	---

	<!-- This model card has been generated automatically according to the information Keras had access to. You should
	probably proofread and complete it, then remove this comment. -->

	# vedantjumle/xlnet-1

	This model is a fine-tuned version of [xlnet-large-cased](https://huggingface.co./xlnet-large-cased) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Train Loss: 0.0053
	- Validation Loss: 0.4856
	- Train Accuracy: 0.9033
	- Epoch: 93

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- optimizer: {'name': 'Adam', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': True, 'is_legacy_optimizer': False, 'learning_rate': {'module': 'keras.optimizers.schedules', 'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 6000, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'registered_name': None}, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False}
	- training_precision: float32

	### Training results

	\| Train Loss \| Validation Loss \| Train Accuracy \| Epoch \|
	\|:----------:\|:---------------:\|:--------------:\|:-----:\|
	\| 5.1007 \| 4.9565 \| 0.0133 \| 0 \|
	\| 5.0503 \| 4.8870 \| 0.0367 \| 1 \|
	\| 4.9095 \| 4.6674 \| 0.07 \| 2 \|
	\| 4.5990 \| 4.1706 \| 0.2033 \| 3 \|
	\| 4.0403 \| 3.4616 \| 0.4267 \| 4 \|
	\| 3.2648 \| 2.6274 \| 0.6033 \| 5 \|
	\| 2.5315 \| 1.8851 \| 0.71 \| 6 \|
	\| 1.8938 \| 1.4084 \| 0.8033 \| 7 \|
	\| 1.3599 \| 1.0397 \| 0.84 \| 8 \|
	\| 0.9752 \| 0.7675 \| 0.8667 \| 9 \|
	\| 0.6995 \| 0.6496 \| 0.8667 \| 10 \|
	\| 0.5132 \| 0.5293 \| 0.89 \| 11 \|
	\| 0.3848 \| 0.4618 \| 0.9 \| 12 \|
	\| 0.2920 \| 0.4516 \| 0.8733 \| 13 \|
	\| 0.2286 \| 0.4097 \| 0.8967 \| 14 \|
	\| 0.1789 \| 0.3951 \| 0.9 \| 15 \|
	\| 0.1512 \| 0.3845 \| 0.8933 \| 16 \|
	\| 0.1320 \| 0.3741 \| 0.9067 \| 17 \|
	\| 0.1116 \| 0.3553 \| 0.9067 \| 18 \|
	\| 0.0935 \| 0.3710 \| 0.9 \| 19 \|
	\| 0.0886 \| 0.3831 \| 0.9067 \| 20 \|
	\| 0.0723 \| 0.3490 \| 0.91 \| 21 \|
	\| 0.0641 \| 0.3448 \| 0.91 \| 22 \|
	\| 0.0601 \| 0.3682 \| 0.9 \| 23 \|
	\| 0.0590 \| 0.3716 \| 0.9033 \| 24 \|
	\| 0.0491 \| 0.3619 \| 0.91 \| 25 \|
	\| 0.0404 \| 0.3728 \| 0.9033 \| 26 \|
	\| 0.0394 \| 0.3624 \| 0.91 \| 27 \|
	\| 0.0394 \| 0.3249 \| 0.9167 \| 28 \|
	\| 0.0387 \| 0.3465 \| 0.91 \| 29 \|
	\| 0.0456 \| 0.3580 \| 0.91 \| 30 \|
	\| 0.0323 \| 0.3645 \| 0.9133 \| 31 \|
	\| 0.0308 \| 0.3633 \| 0.9133 \| 32 \|
	\| 0.0312 \| 0.3658 \| 0.9033 \| 33 \|
	\| 0.0244 \| 0.3621 \| 0.9067 \| 34 \|
	\| 0.0255 \| 0.3705 \| 0.9067 \| 35 \|
	\| 0.0238 \| 0.3618 \| 0.9067 \| 36 \|
	\| 0.0222 \| 0.3603 \| 0.9067 \| 37 \|
	\| 0.0230 \| 0.3678 \| 0.9067 \| 38 \|
	\| 0.0272 \| 0.4125 \| 0.9033 \| 39 \|
	\| 0.0318 \| 0.3973 \| 0.91 \| 40 \|
	\| 0.0262 \| 0.3871 \| 0.9067 \| 41 \|
	\| 0.0299 \| 0.3935 \| 0.9033 \| 42 \|
	\| 0.0285 \| 0.4192 \| 0.9067 \| 43 \|
	\| 0.0206 \| 0.4100 \| 0.9133 \| 44 \|
	\| 0.0188 \| 0.4106 \| 0.9067 \| 45 \|
	\| 0.0179 \| 0.4355 \| 0.91 \| 46 \|
	\| 0.0151 \| 0.4091 \| 0.9133 \| 47 \|
	\| 0.0138 \| 0.4046 \| 0.9167 \| 48 \|
	\| 0.0128 \| 0.4063 \| 0.91 \| 49 \|
	\| 0.0174 \| 0.4197 \| 0.91 \| 50 \|
	\| 0.0247 \| 0.4015 \| 0.9133 \| 51 \|
	\| 0.0159 \| 0.4290 \| 0.91 \| 52 \|
	\| 0.0161 \| 0.4353 \| 0.9033 \| 53 \|
	\| 0.0163 \| 0.4568 \| 0.9033 \| 54 \|
	\| 0.0153 \| 0.4428 \| 0.8933 \| 55 \|
	\| 0.0145 \| 0.4273 \| 0.9033 \| 56 \|
	\| 0.0129 \| 0.4315 \| 0.8967 \| 57 \|
	\| 0.0107 \| 0.4265 \| 0.8933 \| 58 \|
	\| 0.0173 \| 0.4303 \| 0.8967 \| 59 \|
	\| 0.0150 \| 0.4386 \| 0.8933 \| 60 \|
	\| 0.0166 \| 0.4308 \| 0.91 \| 61 \|
	\| 0.0135 \| 0.4533 \| 0.8933 \| 62 \|
	\| 0.0096 \| 0.4507 \| 0.9 \| 63 \|
	\| 0.0091 \| 0.4371 \| 0.9033 \| 64 \|
	\| 0.0089 \| 0.4383 \| 0.9033 \| 65 \|
	\| 0.0083 \| 0.4450 \| 0.9033 \| 66 \|
	\| 0.0080 \| 0.4487 \| 0.9033 \| 67 \|
	\| 0.0082 \| 0.4500 \| 0.9 \| 68 \|
	\| 0.0077 \| 0.4528 \| 0.9033 \| 69 \|
	\| 0.0075 \| 0.4516 \| 0.9 \| 70 \|
	\| 0.0073 \| 0.4474 \| 0.9 \| 71 \|
	\| 0.0222 \| 0.4517 \| 0.9 \| 72 \|
	\| 0.0082 \| 0.4778 \| 0.9033 \| 73 \|
	\| 0.0072 \| 0.4674 \| 0.9 \| 74 \|
	\| 0.0072 \| 0.4641 \| 0.8967 \| 75 \|
	\| 0.0068 \| 0.4537 \| 0.9 \| 76 \|
	\| 0.0066 \| 0.4565 \| 0.8967 \| 77 \|
	\| 0.0063 \| 0.4551 \| 0.9033 \| 78 \|
	\| 0.0078 \| 0.4614 \| 0.8967 \| 79 \|
	\| 0.0107 \| 0.4598 \| 0.8967 \| 80 \|
	\| 0.0069 \| 0.4536 \| 0.9 \| 81 \|
	\| 0.0107 \| 0.4594 \| 0.9033 \| 82 \|
	\| 0.0072 \| 0.4353 \| 0.9033 \| 83 \|
	\| 0.0112 \| 0.4995 \| 0.9 \| 84 \|
	\| 0.0063 \| 0.4875 \| 0.8967 \| 85 \|
	\| 0.0060 \| 0.4859 \| 0.9033 \| 86 \|
	\| 0.0061 \| 0.4804 \| 0.9 \| 87 \|
	\| 0.0058 \| 0.4811 \| 0.9033 \| 88 \|
	\| 0.0058 \| 0.4805 \| 0.9033 \| 89 \|
	\| 0.0057 \| 0.4811 \| 0.9033 \| 90 \|
	\| 0.0057 \| 0.4865 \| 0.9033 \| 91 \|
	\| 0.0055 \| 0.4864 \| 0.9033 \| 92 \|
	\| 0.0053 \| 0.4856 \| 0.9033 \| 93 \|


	### Framework versions

	- Transformers 4.34.0
	- TensorFlow 2.13.0
	- Datasets 2.14.5
	- Tokenizers 0.14.1

	---
	license: mit
	base_model: xlnet-large-cased
	tags:
	- generated_from_keras_callback
	model-index:
	- name: vedantjumle/xlnet-1
	results: []
	---

	<!-- This model card has been generated automatically according to the information Keras had access to. You should
	probably proofread and complete it, then remove this comment. -->

	# vedantjumle/xlnet-1

	This model is a fine-tuned version of [xlnet-large-cased](https://huggingface.co./xlnet-large-cased) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Train Loss: 0.0053
	- Validation Loss: 0.4856
	- Train Accuracy: 0.9033
	- Epoch: 93

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- optimizer: {'name': 'Adam', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': True, 'is_legacy_optimizer': False, 'learning_rate': {'module': 'keras.optimizers.schedules', 'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 6000, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'registered_name': None}, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False}
	- training_precision: float32

	### Training results

	\| Train Loss \| Validation Loss \| Train Accuracy \| Epoch \|
	\|:----------:\|:---------------:\|:--------------:\|:-----:\|
	\| 5.1007 \| 4.9565 \| 0.0133 \| 0 \|
	\| 5.0503 \| 4.8870 \| 0.0367 \| 1 \|
	\| 4.9095 \| 4.6674 \| 0.07 \| 2 \|
	\| 4.5990 \| 4.1706 \| 0.2033 \| 3 \|
	\| 4.0403 \| 3.4616 \| 0.4267 \| 4 \|
	\| 3.2648 \| 2.6274 \| 0.6033 \| 5 \|
	\| 2.5315 \| 1.8851 \| 0.71 \| 6 \|
	\| 1.8938 \| 1.4084 \| 0.8033 \| 7 \|
	\| 1.3599 \| 1.0397 \| 0.84 \| 8 \|
	\| 0.9752 \| 0.7675 \| 0.8667 \| 9 \|
	\| 0.6995 \| 0.6496 \| 0.8667 \| 10 \|
	\| 0.5132 \| 0.5293 \| 0.89 \| 11 \|
	\| 0.3848 \| 0.4618 \| 0.9 \| 12 \|
	\| 0.2920 \| 0.4516 \| 0.8733 \| 13 \|
	\| 0.2286 \| 0.4097 \| 0.8967 \| 14 \|
	\| 0.1789 \| 0.3951 \| 0.9 \| 15 \|
	\| 0.1512 \| 0.3845 \| 0.8933 \| 16 \|
	\| 0.1320 \| 0.3741 \| 0.9067 \| 17 \|
	\| 0.1116 \| 0.3553 \| 0.9067 \| 18 \|
	\| 0.0935 \| 0.3710 \| 0.9 \| 19 \|
	\| 0.0886 \| 0.3831 \| 0.9067 \| 20 \|
	\| 0.0723 \| 0.3490 \| 0.91 \| 21 \|
	\| 0.0641 \| 0.3448 \| 0.91 \| 22 \|
	\| 0.0601 \| 0.3682 \| 0.9 \| 23 \|
	\| 0.0590 \| 0.3716 \| 0.9033 \| 24 \|
	\| 0.0491 \| 0.3619 \| 0.91 \| 25 \|
	\| 0.0404 \| 0.3728 \| 0.9033 \| 26 \|
	\| 0.0394 \| 0.3624 \| 0.91 \| 27 \|
	\| 0.0394 \| 0.3249 \| 0.9167 \| 28 \|
	\| 0.0387 \| 0.3465 \| 0.91 \| 29 \|
	\| 0.0456 \| 0.3580 \| 0.91 \| 30 \|
	\| 0.0323 \| 0.3645 \| 0.9133 \| 31 \|
	\| 0.0308 \| 0.3633 \| 0.9133 \| 32 \|
	\| 0.0312 \| 0.3658 \| 0.9033 \| 33 \|
	\| 0.0244 \| 0.3621 \| 0.9067 \| 34 \|
	\| 0.0255 \| 0.3705 \| 0.9067 \| 35 \|
	\| 0.0238 \| 0.3618 \| 0.9067 \| 36 \|
	\| 0.0222 \| 0.3603 \| 0.9067 \| 37 \|
	\| 0.0230 \| 0.3678 \| 0.9067 \| 38 \|
	\| 0.0272 \| 0.4125 \| 0.9033 \| 39 \|
	\| 0.0318 \| 0.3973 \| 0.91 \| 40 \|
	\| 0.0262 \| 0.3871 \| 0.9067 \| 41 \|
	\| 0.0299 \| 0.3935 \| 0.9033 \| 42 \|
	\| 0.0285 \| 0.4192 \| 0.9067 \| 43 \|
	\| 0.0206 \| 0.4100 \| 0.9133 \| 44 \|
	\| 0.0188 \| 0.4106 \| 0.9067 \| 45 \|
	\| 0.0179 \| 0.4355 \| 0.91 \| 46 \|
	\| 0.0151 \| 0.4091 \| 0.9133 \| 47 \|
	\| 0.0138 \| 0.4046 \| 0.9167 \| 48 \|
	\| 0.0128 \| 0.4063 \| 0.91 \| 49 \|
	\| 0.0174 \| 0.4197 \| 0.91 \| 50 \|
	\| 0.0247 \| 0.4015 \| 0.9133 \| 51 \|
	\| 0.0159 \| 0.4290 \| 0.91 \| 52 \|
	\| 0.0161 \| 0.4353 \| 0.9033 \| 53 \|
	\| 0.0163 \| 0.4568 \| 0.9033 \| 54 \|
	\| 0.0153 \| 0.4428 \| 0.8933 \| 55 \|
	\| 0.0145 \| 0.4273 \| 0.9033 \| 56 \|
	\| 0.0129 \| 0.4315 \| 0.8967 \| 57 \|
	\| 0.0107 \| 0.4265 \| 0.8933 \| 58 \|
	\| 0.0173 \| 0.4303 \| 0.8967 \| 59 \|
	\| 0.0150 \| 0.4386 \| 0.8933 \| 60 \|
	\| 0.0166 \| 0.4308 \| 0.91 \| 61 \|
	\| 0.0135 \| 0.4533 \| 0.8933 \| 62 \|
	\| 0.0096 \| 0.4507 \| 0.9 \| 63 \|
	\| 0.0091 \| 0.4371 \| 0.9033 \| 64 \|
	\| 0.0089 \| 0.4383 \| 0.9033 \| 65 \|
	\| 0.0083 \| 0.4450 \| 0.9033 \| 66 \|
	\| 0.0080 \| 0.4487 \| 0.9033 \| 67 \|
	\| 0.0082 \| 0.4500 \| 0.9 \| 68 \|
	\| 0.0077 \| 0.4528 \| 0.9033 \| 69 \|
	\| 0.0075 \| 0.4516 \| 0.9 \| 70 \|
	\| 0.0073 \| 0.4474 \| 0.9 \| 71 \|
	\| 0.0222 \| 0.4517 \| 0.9 \| 72 \|
	\| 0.0082 \| 0.4778 \| 0.9033 \| 73 \|
	\| 0.0072 \| 0.4674 \| 0.9 \| 74 \|
	\| 0.0072 \| 0.4641 \| 0.8967 \| 75 \|
	\| 0.0068 \| 0.4537 \| 0.9 \| 76 \|
	\| 0.0066 \| 0.4565 \| 0.8967 \| 77 \|
	\| 0.0063 \| 0.4551 \| 0.9033 \| 78 \|
	\| 0.0078 \| 0.4614 \| 0.8967 \| 79 \|
	\| 0.0107 \| 0.4598 \| 0.8967 \| 80 \|
	\| 0.0069 \| 0.4536 \| 0.9 \| 81 \|
	\| 0.0107 \| 0.4594 \| 0.9033 \| 82 \|
	\| 0.0072 \| 0.4353 \| 0.9033 \| 83 \|
	\| 0.0112 \| 0.4995 \| 0.9 \| 84 \|
	\| 0.0063 \| 0.4875 \| 0.8967 \| 85 \|
	\| 0.0060 \| 0.4859 \| 0.9033 \| 86 \|
	\| 0.0061 \| 0.4804 \| 0.9 \| 87 \|
	\| 0.0058 \| 0.4811 \| 0.9033 \| 88 \|
	\| 0.0058 \| 0.4805 \| 0.9033 \| 89 \|
	\| 0.0057 \| 0.4811 \| 0.9033 \| 90 \|
	\| 0.0057 \| 0.4865 \| 0.9033 \| 91 \|
	\| 0.0055 \| 0.4864 \| 0.9033 \| 92 \|
	\| 0.0053 \| 0.4856 \| 0.9033 \| 93 \|


	### Framework versions

	- Transformers 4.34.0
	- TensorFlow 2.13.0
	- Datasets 2.14.5
	- Tokenizers 0.14.1