metadata

license: mit
base_model: xlnet-large-cased
tags:
  - generated_from_keras_callback
model-index:
  - name: vedantjumle/xlnet-1
    results: []

vedantjumle/xlnet-1

This model is a fine-tuned version of xlnet-large-cased on an unknown dataset. It achieves the following results on the evaluation set:

Train Loss: 0.0053
Validation Loss: 0.4856
Train Accuracy: 0.9033
Epoch: 93

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

optimizer: {'name': 'Adam', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': True, 'is_legacy_optimizer': False, 'learning_rate': {'module': 'keras.optimizers.schedules', 'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 6000, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'registered_name': None}, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False}
training_precision: float32

Training results

Train Loss	Validation Loss	Train Accuracy	Epoch
5.1007	4.9565	0.0133	0
5.0503	4.8870	0.0367	1
4.9095	4.6674	0.07	2
4.5990	4.1706	0.2033	3
4.0403	3.4616	0.4267	4
3.2648	2.6274	0.6033	5
2.5315	1.8851	0.71	6
1.8938	1.4084	0.8033	7
1.3599	1.0397	0.84	8
0.9752	0.7675	0.8667	9
0.6995	0.6496	0.8667	10
0.5132	0.5293	0.89	11
0.3848	0.4618	0.9	12
0.2920	0.4516	0.8733	13
0.2286	0.4097	0.8967	14
0.1789	0.3951	0.9	15
0.1512	0.3845	0.8933	16
0.1320	0.3741	0.9067	17
0.1116	0.3553	0.9067	18
0.0935	0.3710	0.9	19
0.0886	0.3831	0.9067	20
0.0723	0.3490	0.91	21
0.0641	0.3448	0.91	22
0.0601	0.3682	0.9	23
0.0590	0.3716	0.9033	24
0.0491	0.3619	0.91	25
0.0404	0.3728	0.9033	26
0.0394	0.3624	0.91	27
0.0394	0.3249	0.9167	28
0.0387	0.3465	0.91	29
0.0456	0.3580	0.91	30
0.0323	0.3645	0.9133	31
0.0308	0.3633	0.9133	32
0.0312	0.3658	0.9033	33
0.0244	0.3621	0.9067	34
0.0255	0.3705	0.9067	35
0.0238	0.3618	0.9067	36
0.0222	0.3603	0.9067	37
0.0230	0.3678	0.9067	38
0.0272	0.4125	0.9033	39
0.0318	0.3973	0.91	40
0.0262	0.3871	0.9067	41
0.0299	0.3935	0.9033	42
0.0285	0.4192	0.9067	43
0.0206	0.4100	0.9133	44
0.0188	0.4106	0.9067	45
0.0179	0.4355	0.91	46
0.0151	0.4091	0.9133	47
0.0138	0.4046	0.9167	48
0.0128	0.4063	0.91	49
0.0174	0.4197	0.91	50
0.0247	0.4015	0.9133	51
0.0159	0.4290	0.91	52
0.0161	0.4353	0.9033	53
0.0163	0.4568	0.9033	54
0.0153	0.4428	0.8933	55
0.0145	0.4273	0.9033	56
0.0129	0.4315	0.8967	57
0.0107	0.4265	0.8933	58
0.0173	0.4303	0.8967	59
0.0150	0.4386	0.8933	60
0.0166	0.4308	0.91	61
0.0135	0.4533	0.8933	62
0.0096	0.4507	0.9	63
0.0091	0.4371	0.9033	64
0.0089	0.4383	0.9033	65
0.0083	0.4450	0.9033	66
0.0080	0.4487	0.9033	67
0.0082	0.4500	0.9	68
0.0077	0.4528	0.9033	69
0.0075	0.4516	0.9	70
0.0073	0.4474	0.9	71
0.0222	0.4517	0.9	72
0.0082	0.4778	0.9033	73
0.0072	0.4674	0.9	74
0.0072	0.4641	0.8967	75
0.0068	0.4537	0.9	76
0.0066	0.4565	0.8967	77
0.0063	0.4551	0.9033	78
0.0078	0.4614	0.8967	79
0.0107	0.4598	0.8967	80
0.0069	0.4536	0.9	81
0.0107	0.4594	0.9033	82
0.0072	0.4353	0.9033	83
0.0112	0.4995	0.9	84
0.0063	0.4875	0.8967	85
0.0060	0.4859	0.9033	86
0.0061	0.4804	0.9	87
0.0058	0.4811	0.9033	88
0.0058	0.4805	0.9033	89
0.0057	0.4811	0.9033	90
0.0057	0.4865	0.9033	91
0.0055	0.4864	0.9033	92
0.0053	0.4856	0.9033	93

Framework versions

Transformers 4.34.0
TensorFlow 2.13.0
Datasets 2.14.5
Tokenizers 0.14.1