llm3br256
This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the antrepriz dataset. It achieves the following results on the evaluation set:
- Loss: 0.0119
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.0434 | 0.1717 | 5 | 0.0371 |
0.0304 | 0.3433 | 10 | 0.0272 |
0.0224 | 0.5150 | 15 | 0.0224 |
0.0193 | 0.6867 | 20 | 0.0202 |
0.0167 | 0.8584 | 25 | 0.0188 |
0.0474 | 1.0300 | 30 | 0.0174 |
0.0161 | 1.2017 | 35 | 0.0167 |
0.013 | 1.3734 | 40 | 0.0159 |
0.0134 | 1.5451 | 45 | 0.0151 |
0.0125 | 1.7167 | 50 | 0.0145 |
0.0126 | 1.8884 | 55 | 0.0140 |
0.0112 | 2.0601 | 60 | 0.0137 |
0.0118 | 2.2318 | 65 | 0.0134 |
0.0098 | 2.4034 | 70 | 0.0130 |
0.0124 | 2.5751 | 75 | 0.0128 |
0.0128 | 2.7468 | 80 | 0.0126 |
0.0103 | 2.9185 | 85 | 0.0125 |
0.0112 | 3.0901 | 90 | 0.0122 |
0.0088 | 3.2618 | 95 | 0.0121 |
0.0092 | 3.4335 | 100 | 0.0122 |
0.0087 | 3.6052 | 105 | 0.0120 |
0.0091 | 3.7768 | 110 | 0.0118 |
0.0094 | 3.9485 | 115 | 0.0117 |
0.0088 | 4.1202 | 120 | 0.0118 |
0.0084 | 4.2918 | 125 | 0.0118 |
0.0083 | 4.4635 | 130 | 0.0119 |
0.0077 | 4.6352 | 135 | 0.0119 |
0.0081 | 4.8069 | 140 | 0.0119 |
0.0079 | 4.9785 | 145 | 0.0119 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.1
- Pytorch 2.4.0+cu121
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- 0
Model tree for sizhkhy/antrepriz
Base model
meta-llama/Llama-3.2-3B-Instruct
Finetuned
unsloth/Llama-3.2-3B-Instruct