llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the gommt dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0206

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 25.0

Training results

Training Loss Epoch Step Validation Loss
0.232 0.1613 25 0.2193
0.1524 0.3226 50 0.1507
0.115 0.4839 75 0.1165
0.0875 0.6452 100 0.1004
0.092 0.8065 125 0.0909
0.1077 0.9677 150 0.0900
0.0688 1.1290 175 0.0778
0.0682 1.2903 200 0.0723
0.0621 1.4516 225 0.0668
0.0668 1.6129 250 0.0646
0.0672 1.7742 275 0.0587
0.0484 1.9355 300 0.0544
0.0468 2.0968 325 0.0516
0.0438 2.2581 350 0.0503
0.0364 2.4194 375 0.0493
0.0365 2.5806 400 0.0460
0.0469 2.7419 425 0.0432
0.027 2.9032 450 0.0379
0.026 3.0645 475 0.0356
0.0223 3.2258 500 0.0357
0.0228 3.3871 525 0.0352
0.0199 3.5484 550 0.0336
0.0227 3.7097 575 0.0308
0.0207 3.8710 600 0.0292
0.0125 4.0323 625 0.0304
0.0146 4.1935 650 0.0279
0.0126 4.3548 675 0.0283
0.0141 4.5161 700 0.0270
0.0133 4.6774 725 0.0254
0.0098 4.8387 750 0.0250
0.0093 5.0 775 0.0234
0.0073 5.1613 800 0.0247
0.0087 5.3226 825 0.0254
0.0102 5.4839 850 0.0242
0.0077 5.6452 875 0.0230
0.0085 5.8065 900 0.0230
0.0069 5.9677 925 0.0213
0.0056 6.1290 950 0.0226
0.0063 6.2903 975 0.0224
0.0055 6.4516 1000 0.0227
0.0067 6.6129 1025 0.0229
0.0052 6.7742 1050 0.0224
0.008 6.9355 1075 0.0219
0.0053 7.0968 1100 0.0227
0.0049 7.2581 1125 0.0220
0.0059 7.4194 1150 0.0218
0.0045 7.5806 1175 0.0215
0.0058 7.7419 1200 0.0206
0.0047 7.9032 1225 0.0207
0.0043 8.0645 1250 0.0223
0.0046 8.2258 1275 0.0218
0.0036 8.3871 1300 0.0225
0.0034 8.5484 1325 0.0216

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
18
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for sizhkhy/gommt

Adapter
(327)
this model