OpenELM-1_1B-DPO-full-max-second-reward

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4829
  • Rewards/chosen: -12.4375
  • Rewards/rejected: -12.875
  • Rewards/accuracies: 0.5371
  • Rewards/margins: 0.4414
  • Logps/rejected: -1576.0
  • Logps/chosen: -1560.0
  • Logits/rejected: 10.8125
  • Logits/chosen: 8.8125

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6927 0.1047 100 0.6950 -0.2334 -0.2539 0.5254 0.0201 -314.0 -342.0 -13.125 -13.25
0.6759 0.2094 200 0.7065 -0.6484 -0.7617 0.5488 0.1123 -366.0 -384.0 -11.6875 -11.9375
0.6912 0.3141 300 0.7235 -1.1484 -1.2344 0.5527 0.0845 -412.0 -434.0 -14.0 -14.0625
0.7002 0.4188 400 0.7412 -1.2734 -1.2578 0.4883 -0.0128 -414.0 -446.0 -13.5 -13.5
0.6819 0.5236 500 0.7542 -1.75 -1.7656 0.4961 0.0173 -466.0 -492.0 -12.125 -12.3125
0.7065 0.6283 600 0.7290 -1.9297 -1.9453 0.5039 0.0159 -482.0 -512.0 -12.1875 -12.375
0.6892 0.7330 700 0.7298 -2.1094 -2.1719 0.5117 0.0518 -506.0 -532.0 -11.75 -11.8125
0.7117 0.8377 800 0.7436 -2.25 -2.2812 0.4961 0.0247 -516.0 -544.0 -8.5625 -8.875
0.6835 0.9424 900 0.7565 -2.1562 -2.1875 0.5137 0.0284 -508.0 -536.0 -7.8125 -8.1875
0.2775 1.0471 1000 0.9428 -4.0938 -4.125 0.5137 0.0229 -700.0 -728.0 -10.75 -11.1875
0.2471 1.1518 1100 0.9772 -5.6562 -5.75 0.5234 0.0986 -864.0 -884.0 -3.9844 -4.8438
0.2465 1.2565 1200 0.9777 -5.125 -5.2188 0.5254 0.0688 -808.0 -832.0 -4.1562 -5.0312
0.2601 1.3613 1300 0.9855 -6.5 -6.6875 0.5488 0.1846 -956.0 -968.0 0.3164 -0.7695
0.2404 1.4660 1400 0.9077 -6.8438 -7.0938 0.5293 0.2520 -1000.0 -1004.0 2.0312 0.6367
0.2371 1.5707 1500 0.9027 -5.8438 -6.0625 0.5508 0.2061 -896.0 -904.0 1.4141 0.0143
0.2329 1.6754 1600 0.9480 -6.7812 -7.0312 0.5488 0.2617 -992.0 -996.0 2.0312 0.5664
0.231 1.7801 1700 0.8705 -6.2812 -6.5625 0.5527 0.2598 -944.0 -948.0 -1.6484 -2.7031
0.2045 1.8848 1800 0.9315 -7.4375 -7.7188 0.5625 0.3086 -1064.0 -1064.0 -1.3906 -2.5
0.2467 1.9895 1900 0.8831 -7.0625 -7.375 0.5586 0.3145 -1024.0 -1024.0 0.2656 -0.9961
0.0377 2.0942 2000 1.3504 -10.6875 -11.0625 0.5371 0.3652 -1392.0 -1384.0 6.25 4.5625
0.0265 2.1990 2100 1.5050 -11.5 -11.8125 0.5566 0.3320 -1472.0 -1472.0 8.1875 6.375
0.0363 2.3037 2200 1.4563 -11.625 -11.9375 0.5312 0.3398 -1480.0 -1480.0 8.9375 7.1562
0.0292 2.4084 2300 1.5373 -12.125 -12.5 0.5449 0.3535 -1536.0 -1528.0 9.6875 7.7812
0.0491 2.5131 2400 1.4556 -12.0625 -12.5 0.5410 0.4355 -1536.0 -1528.0 9.8125 7.875
0.0324 2.6178 2500 1.4875 -12.5 -12.9375 0.5391 0.4414 -1584.0 -1568.0 10.5 8.5625
0.0247 2.7225 2600 1.4541 -12.0625 -12.5 0.5410 0.4336 -1536.0 -1528.0 10.25 8.3125
0.0335 2.8272 2700 1.4734 -12.3125 -12.75 0.5371 0.4434 -1568.0 -1552.0 10.6875 8.75
0.0263 2.9319 2800 1.4829 -12.4375 -12.875 0.5371 0.4414 -1576.0 -1560.0 10.8125 8.8125

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.3.0
  • Datasets 3.0.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
1.08B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.