llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the antrepriz dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0119

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss
0.0434 0.1717 5 0.0371
0.0304 0.3433 10 0.0272
0.0224 0.5150 15 0.0224
0.0193 0.6867 20 0.0202
0.0167 0.8584 25 0.0188
0.0474 1.0300 30 0.0174
0.0161 1.2017 35 0.0167
0.013 1.3734 40 0.0159
0.0134 1.5451 45 0.0151
0.0125 1.7167 50 0.0145
0.0126 1.8884 55 0.0140
0.0112 2.0601 60 0.0137
0.0118 2.2318 65 0.0134
0.0098 2.4034 70 0.0130
0.0124 2.5751 75 0.0128
0.0128 2.7468 80 0.0126
0.0103 2.9185 85 0.0125
0.0112 3.0901 90 0.0122
0.0088 3.2618 95 0.0121
0.0092 3.4335 100 0.0122
0.0087 3.6052 105 0.0120
0.0091 3.7768 110 0.0118
0.0094 3.9485 115 0.0117
0.0088 4.1202 120 0.0118
0.0084 4.2918 125 0.0118
0.0083 4.4635 130 0.0119
0.0077 4.6352 135 0.0119
0.0081 4.8069 140 0.0119
0.0079 4.9785 145 0.0119

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for sizhkhy/antrepriz

Adapter
(89)
this model