KoModernBERT-chp-03

This model is a fine-tuned version of CocoRoF/KoModernBERT-chp-02 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1413

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 512
  • total_eval_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss
17.3702 0.0904 5000 2.1672
17.2707 0.1808 10000 2.1613
17.384 0.2712 15000 2.1601
17.1693 0.3616 20000 2.1567
17.2089 0.4520 25000 2.1509
17.1292 0.5424 30000 2.1547
16.9682 0.6329 35000 2.1470
17.2477 0.7233 40000 2.1433
17.2785 0.8137 45000 2.1427
16.9302 0.9041 50000 2.1432
17.1989 0.9945 55000 2.1413

Framework versions

  • Transformers 4.48.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
2
Safetensors
Model size
153M params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for CocoRoF/KoModernBERT-chp-03

Unable to build the model tree, the base model loops to the model itself. Learn more.