length_seed-42_1e-3
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.1073
- Accuracy: 0.4167
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 32000
- num_epochs: 20.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
6.3921 | 0.9999 | 1983 | 4.6383 | 0.2613 |
4.4107 | 1.9999 | 3966 | 3.9370 | 0.3262 |
3.8167 | 2.9998 | 5949 | 3.6319 | 0.3567 |
3.5468 | 3.9997 | 7932 | 3.4810 | 0.3708 |
3.3994 | 4.9997 | 9915 | 3.3998 | 0.3796 |
3.3048 | 5.9996 | 11898 | 3.3454 | 0.3847 |
3.24 | 6.9996 | 13881 | 3.3108 | 0.3890 |
3.1934 | 8.0 | 15865 | 3.2871 | 0.3915 |
3.1609 | 8.9999 | 17848 | 3.2674 | 0.3935 |
3.132 | 9.9999 | 19831 | 3.2552 | 0.3947 |
3.1124 | 10.9998 | 21814 | 3.2466 | 0.3960 |
3.0954 | 11.9997 | 23797 | 3.2362 | 0.3975 |
3.0835 | 12.9997 | 25780 | 3.2282 | 0.3985 |
3.0746 | 13.9996 | 27763 | 3.2232 | 0.3990 |
3.0675 | 14.9996 | 29746 | 3.2189 | 0.3995 |
3.0623 | 16.0 | 31730 | 3.2167 | 0.3995 |
3.0446 | 16.9999 | 33713 | 3.1815 | 0.4041 |
2.9722 | 17.9999 | 35696 | 3.1497 | 0.4089 |
2.8843 | 18.9998 | 37679 | 3.1224 | 0.4133 |
2.7816 | 19.9987 | 39660 | 3.1073 | 0.4167 |
Framework versions
- Transformers 4.45.1
- Pytorch 2.4.1+cu121
- Datasets 2.19.1
- Tokenizers 0.20.0
- Downloads last month
- 10
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.