KoModernBERT-chp-06

This model is a fine-tuned version of CocoRoF/KoModernBERT-chp-05 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9652

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 512
  • total_eval_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss
17.0359 0.0542 5000 2.0982
16.6082 0.1085 10000 2.0874
16.3776 0.1627 15000 2.0795
16.3601 0.2170 20000 2.0687
16.4271 0.2712 25000 2.0609
16.1757 0.3255 30000 2.0522
16.4832 0.3797 35000 2.0439
16.243 0.4340 40000 2.0324
16.1772 0.4882 45000 2.0268
16.2524 0.5424 50000 2.0190
16.2925 0.5967 55000 2.0131
16.1287 0.6509 60000 1.9996
15.9769 0.7052 65000 1.9977
15.7585 0.7594 70000 1.9885
16.0227 0.8137 75000 1.9840
15.4506 0.8679 80000 1.9777
15.7857 0.9222 85000 1.9699
15.3766 0.9764 90000 1.9652

Framework versions

  • Transformers 4.48.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
2
Safetensors
Model size
153M params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for CocoRoF/KoModernBERT-chp-06