llm3br256
This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the rommel_importgenius_4b8 dataset. It achieves the following results on the evaluation set:
- Loss: 0.0130
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.0672 | 0.0418 | 5 | 0.0755 |
0.0476 | 0.0837 | 10 | 0.0452 |
0.0337 | 0.1255 | 15 | 0.0356 |
0.0333 | 0.1674 | 20 | 0.0308 |
0.0258 | 0.2092 | 25 | 0.0272 |
0.023 | 0.2510 | 30 | 0.0255 |
0.0202 | 0.2929 | 35 | 0.0234 |
0.0188 | 0.3347 | 40 | 0.0218 |
0.0185 | 0.3766 | 45 | 0.0208 |
0.0199 | 0.4184 | 50 | 0.0200 |
0.0198 | 0.4603 | 55 | 0.0195 |
0.0179 | 0.5021 | 60 | 0.0189 |
0.0185 | 0.5439 | 65 | 0.0186 |
0.0174 | 0.5858 | 70 | 0.0186 |
0.0157 | 0.6276 | 75 | 0.0183 |
0.0175 | 0.6695 | 80 | 0.0176 |
0.0175 | 0.7113 | 85 | 0.0176 |
0.0164 | 0.7531 | 90 | 0.0171 |
0.0182 | 0.7950 | 95 | 0.0168 |
0.019 | 0.8368 | 100 | 0.0167 |
0.0163 | 0.8787 | 105 | 0.0158 |
0.0145 | 0.9205 | 110 | 0.0158 |
0.0165 | 0.9623 | 115 | 0.0155 |
0.0205 | 1.0042 | 120 | 0.0152 |
0.0105 | 1.0460 | 125 | 0.0155 |
0.0147 | 1.0879 | 130 | 0.0157 |
0.0148 | 1.1297 | 135 | 0.0160 |
0.0115 | 1.1715 | 140 | 0.0153 |
0.0166 | 1.2134 | 145 | 0.0153 |
0.015 | 1.2552 | 150 | 0.0156 |
0.0148 | 1.2971 | 155 | 0.0157 |
0.0112 | 1.3389 | 160 | 0.0159 |
0.0128 | 1.3808 | 165 | 0.0153 |
0.0125 | 1.4226 | 170 | 0.0151 |
0.0137 | 1.4644 | 175 | 0.0150 |
0.0131 | 1.5063 | 180 | 0.0145 |
0.0105 | 1.5481 | 185 | 0.0145 |
0.0126 | 1.5900 | 190 | 0.0144 |
0.0119 | 1.6318 | 195 | 0.0145 |
0.016 | 1.6736 | 200 | 0.0147 |
0.0143 | 1.7155 | 205 | 0.0150 |
0.0139 | 1.7573 | 210 | 0.0150 |
0.0139 | 1.7992 | 215 | 0.0145 |
0.0161 | 1.8410 | 220 | 0.0143 |
0.0098 | 1.8828 | 225 | 0.0138 |
0.0108 | 1.9247 | 230 | 0.0140 |
0.0117 | 1.9665 | 235 | 0.0141 |
0.0109 | 2.0084 | 240 | 0.0138 |
0.0093 | 2.0502 | 245 | 0.0145 |
0.0102 | 2.0921 | 250 | 0.0143 |
0.0104 | 2.1339 | 255 | 0.0141 |
0.0108 | 2.1757 | 260 | 0.0147 |
0.0104 | 2.2176 | 265 | 0.0142 |
0.0103 | 2.2594 | 270 | 0.0144 |
0.0107 | 2.3013 | 275 | 0.0144 |
0.0104 | 2.3431 | 280 | 0.0141 |
0.0092 | 2.3849 | 285 | 0.0143 |
0.0107 | 2.4268 | 290 | 0.0140 |
0.0112 | 2.4686 | 295 | 0.0143 |
0.01 | 2.5105 | 300 | 0.0143 |
0.0096 | 2.5523 | 305 | 0.0138 |
0.0096 | 2.5941 | 310 | 0.0137 |
0.0099 | 2.6360 | 315 | 0.0137 |
0.009 | 2.6778 | 320 | 0.0138 |
0.0097 | 2.7197 | 325 | 0.0137 |
0.0097 | 2.7615 | 330 | 0.0136 |
0.0108 | 2.8033 | 335 | 0.0136 |
0.0092 | 2.8452 | 340 | 0.0132 |
0.0092 | 2.8870 | 345 | 0.0132 |
0.0095 | 2.9289 | 350 | 0.0130 |
0.0094 | 2.9707 | 355 | 0.0127 |
0.0088 | 3.0126 | 360 | 0.0127 |
0.0086 | 3.0544 | 365 | 0.0131 |
0.0094 | 3.0962 | 370 | 0.0134 |
0.0075 | 3.1381 | 375 | 0.0137 |
0.0068 | 3.1799 | 380 | 0.0136 |
0.0096 | 3.2218 | 385 | 0.0136 |
0.0088 | 3.2636 | 390 | 0.0137 |
0.008 | 3.3054 | 395 | 0.0138 |
0.0085 | 3.3473 | 400 | 0.0137 |
0.0091 | 3.3891 | 405 | 0.0136 |
0.0049 | 3.4310 | 410 | 0.0134 |
0.0072 | 3.4728 | 415 | 0.0131 |
0.0063 | 3.5146 | 420 | 0.0133 |
0.0076 | 3.5565 | 425 | 0.0131 |
0.0076 | 3.5983 | 430 | 0.0129 |
0.0074 | 3.6402 | 435 | 0.0130 |
0.0074 | 3.6820 | 440 | 0.0132 |
0.0067 | 3.7238 | 445 | 0.0132 |
0.0064 | 3.7657 | 450 | 0.0130 |
0.0091 | 3.8075 | 455 | 0.0130 |
0.0074 | 3.8494 | 460 | 0.0131 |
0.0076 | 3.8912 | 465 | 0.0132 |
0.007 | 3.9331 | 470 | 0.0132 |
0.0082 | 3.9749 | 475 | 0.0132 |
0.0059 | 4.0167 | 480 | 0.0133 |
0.0066 | 4.0586 | 485 | 0.0135 |
0.0063 | 4.1004 | 490 | 0.0140 |
0.0059 | 4.1423 | 495 | 0.0144 |
0.0066 | 4.1841 | 500 | 0.0142 |
0.0055 | 4.2259 | 505 | 0.0142 |
0.0067 | 4.2678 | 510 | 0.0142 |
0.0065 | 4.3096 | 515 | 0.0143 |
0.0062 | 4.3515 | 520 | 0.0142 |
0.0065 | 4.3933 | 525 | 0.0141 |
0.007 | 4.4351 | 530 | 0.0139 |
0.0058 | 4.4770 | 535 | 0.0139 |
0.0056 | 4.5188 | 540 | 0.0139 |
0.0062 | 4.5607 | 545 | 0.0139 |
0.0061 | 4.6025 | 550 | 0.0139 |
0.0061 | 4.6444 | 555 | 0.0139 |
0.0068 | 4.6862 | 560 | 0.0138 |
0.0069 | 4.7280 | 565 | 0.0139 |
0.0063 | 4.7699 | 570 | 0.0139 |
0.0065 | 4.8117 | 575 | 0.0139 |
0.0064 | 4.8536 | 580 | 0.0139 |
0.0062 | 4.8954 | 585 | 0.0139 |
0.0065 | 4.9372 | 590 | 0.0139 |
0.0055 | 4.9791 | 595 | 0.0139 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.1
- Pytorch 2.4.0+cu121
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- 4
Model tree for sizhkhy/rommel_importgenius_4b8
Base model
meta-llama/Llama-3.2-3B-Instruct
Finetuned
unsloth/Llama-3.2-3B-Instruct