llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the rommel_importgenius_4b8 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0130

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss
0.0672 0.0418 5 0.0755
0.0476 0.0837 10 0.0452
0.0337 0.1255 15 0.0356
0.0333 0.1674 20 0.0308
0.0258 0.2092 25 0.0272
0.023 0.2510 30 0.0255
0.0202 0.2929 35 0.0234
0.0188 0.3347 40 0.0218
0.0185 0.3766 45 0.0208
0.0199 0.4184 50 0.0200
0.0198 0.4603 55 0.0195
0.0179 0.5021 60 0.0189
0.0185 0.5439 65 0.0186
0.0174 0.5858 70 0.0186
0.0157 0.6276 75 0.0183
0.0175 0.6695 80 0.0176
0.0175 0.7113 85 0.0176
0.0164 0.7531 90 0.0171
0.0182 0.7950 95 0.0168
0.019 0.8368 100 0.0167
0.0163 0.8787 105 0.0158
0.0145 0.9205 110 0.0158
0.0165 0.9623 115 0.0155
0.0205 1.0042 120 0.0152
0.0105 1.0460 125 0.0155
0.0147 1.0879 130 0.0157
0.0148 1.1297 135 0.0160
0.0115 1.1715 140 0.0153
0.0166 1.2134 145 0.0153
0.015 1.2552 150 0.0156
0.0148 1.2971 155 0.0157
0.0112 1.3389 160 0.0159
0.0128 1.3808 165 0.0153
0.0125 1.4226 170 0.0151
0.0137 1.4644 175 0.0150
0.0131 1.5063 180 0.0145
0.0105 1.5481 185 0.0145
0.0126 1.5900 190 0.0144
0.0119 1.6318 195 0.0145
0.016 1.6736 200 0.0147
0.0143 1.7155 205 0.0150
0.0139 1.7573 210 0.0150
0.0139 1.7992 215 0.0145
0.0161 1.8410 220 0.0143
0.0098 1.8828 225 0.0138
0.0108 1.9247 230 0.0140
0.0117 1.9665 235 0.0141
0.0109 2.0084 240 0.0138
0.0093 2.0502 245 0.0145
0.0102 2.0921 250 0.0143
0.0104 2.1339 255 0.0141
0.0108 2.1757 260 0.0147
0.0104 2.2176 265 0.0142
0.0103 2.2594 270 0.0144
0.0107 2.3013 275 0.0144
0.0104 2.3431 280 0.0141
0.0092 2.3849 285 0.0143
0.0107 2.4268 290 0.0140
0.0112 2.4686 295 0.0143
0.01 2.5105 300 0.0143
0.0096 2.5523 305 0.0138
0.0096 2.5941 310 0.0137
0.0099 2.6360 315 0.0137
0.009 2.6778 320 0.0138
0.0097 2.7197 325 0.0137
0.0097 2.7615 330 0.0136
0.0108 2.8033 335 0.0136
0.0092 2.8452 340 0.0132
0.0092 2.8870 345 0.0132
0.0095 2.9289 350 0.0130
0.0094 2.9707 355 0.0127
0.0088 3.0126 360 0.0127
0.0086 3.0544 365 0.0131
0.0094 3.0962 370 0.0134
0.0075 3.1381 375 0.0137
0.0068 3.1799 380 0.0136
0.0096 3.2218 385 0.0136
0.0088 3.2636 390 0.0137
0.008 3.3054 395 0.0138
0.0085 3.3473 400 0.0137
0.0091 3.3891 405 0.0136
0.0049 3.4310 410 0.0134
0.0072 3.4728 415 0.0131
0.0063 3.5146 420 0.0133
0.0076 3.5565 425 0.0131
0.0076 3.5983 430 0.0129
0.0074 3.6402 435 0.0130
0.0074 3.6820 440 0.0132
0.0067 3.7238 445 0.0132
0.0064 3.7657 450 0.0130
0.0091 3.8075 455 0.0130
0.0074 3.8494 460 0.0131
0.0076 3.8912 465 0.0132
0.007 3.9331 470 0.0132
0.0082 3.9749 475 0.0132
0.0059 4.0167 480 0.0133
0.0066 4.0586 485 0.0135
0.0063 4.1004 490 0.0140
0.0059 4.1423 495 0.0144
0.0066 4.1841 500 0.0142
0.0055 4.2259 505 0.0142
0.0067 4.2678 510 0.0142
0.0065 4.3096 515 0.0143
0.0062 4.3515 520 0.0142
0.0065 4.3933 525 0.0141
0.007 4.4351 530 0.0139
0.0058 4.4770 535 0.0139
0.0056 4.5188 540 0.0139
0.0062 4.5607 545 0.0139
0.0061 4.6025 550 0.0139
0.0061 4.6444 555 0.0139
0.0068 4.6862 560 0.0138
0.0069 4.7280 565 0.0139
0.0063 4.7699 570 0.0139
0.0065 4.8117 575 0.0139
0.0064 4.8536 580 0.0139
0.0062 4.8954 585 0.0139
0.0065 4.9372 590 0.0139
0.0055 4.9791 595 0.0139

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
4
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for sizhkhy/rommel_importgenius_4b8

Adapter
(169)
this model