CharlesLi's picture
Model save
05f53ed verified
|
raw
history blame
7.26 kB
metadata
library_name: transformers
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-DPO-full-least-similar
    results: []

OpenELM-1_1B-DPO-full-least-similar

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 201.4325
  • Rewards/chosen: -684.0
  • Rewards/rejected: -592.0
  • Rewards/accuracies: 0.4277
  • Rewards/margins: -93.0
  • Logps/rejected: -59392.0
  • Logps/chosen: -68608.0
  • Logits/rejected: 5.5625
  • Logits/chosen: 5.1562

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6914 0.1047 100 0.6991 -0.3457 -0.3398 0.4199 -0.0053 -322.0 -352.0 -9.4375 -9.8125
0.6914 0.2094 200 17.0172 -58.5 -51.25 0.4238 -7.3125 -5408.0 -6176.0 5.25 4.5312
0.6914 0.3141 300 97.3562 -330.0 -286.0 0.4316 -44.75 -28800.0 -33280.0 -0.9297 -0.9531
0.6914 0.4188 400 103.5919 -352.0 -304.0 0.4316 -48.0 -30592.0 -35328.0 0.4180 0.2930
0.6914 0.5236 500 109.8674 -372.0 -322.0 0.4336 -50.75 -32512.0 -37632.0 1.4766 1.3047
0.6914 0.6283 600 116.7363 -396.0 -342.0 0.4316 -54.0 -34560.0 -39936.0 1.8828 1.6641
0.6914 0.7330 700 123.6395 -420.0 -362.0 0.4336 -57.25 -36608.0 -42240.0 3.1406 2.8438
0.6914 0.8377 800 130.5069 -442.0 -382.0 0.4316 -60.25 -38400.0 -44544.0 4.1562 3.7656
0.6914 0.9424 900 137.3969 -466.0 -402.0 0.4277 -63.5 -40448.0 -46848.0 4.25 3.8594
0.6914 1.0471 1000 143.7038 -488.0 -422.0 0.4297 -66.5 -42496.0 -49152.0 5.6875 5.1562
0.6914 1.1518 1100 150.1531 -510.0 -440.0 0.4297 -69.5 -44288.0 -51200.0 6.5938 6.0
0.6914 1.2565 1200 156.7057 -532.0 -460.0 0.4297 -72.5 -46336.0 -53504.0 5.6875 5.1562
0.6914 1.3613 1300 162.7056 -552.0 -476.0 0.4336 -75.0 -48128.0 -55552.0 5.5938 5.1562
0.6914 1.4660 1400 168.4744 -572.0 -494.0 0.4316 -77.5 -49664.0 -57600.0 5.8438 5.3438
0.6914 1.5707 1500 173.8489 -588.0 -510.0 0.4297 -80.0 -51200.0 -59392.0 6.0312 5.5312
0.6914 1.6754 1600 178.6226 -608.0 -524.0 0.4336 -82.5 -52736.0 -60928.0 5.8438 5.375
0.6914 1.7801 1700 183.2413 -620.0 -536.0 0.4297 -84.5 -54016.0 -62464.0 5.7812 5.3125
0.6914 1.8848 1800 186.8875 -636.0 -548.0 0.4277 -86.0 -55040.0 -64000.0 5.5938 5.1562
0.6914 1.9895 1900 190.4393 -648.0 -560.0 0.4316 -87.5 -56064.0 -65024.0 5.8125 5.375
0.6914 2.0942 2000 193.2805 -656.0 -568.0 0.4297 -89.0 -57088.0 -66048.0 5.5312 5.125
0.6914 2.1990 2100 195.6470 -664.0 -576.0 0.4277 -90.5 -57600.0 -66560.0 5.4688 5.0625
0.6914 2.3037 2200 197.7068 -672.0 -580.0 0.4238 -91.0 -58368.0 -67584.0 5.4688 5.0625
0.6914 2.4084 2300 199.1925 -676.0 -584.0 0.4238 -92.0 -58880.0 -68096.0 5.5 5.125
0.6914 2.5131 2400 200.0977 -680.0 -588.0 0.4258 -92.5 -59136.0 -68096.0 5.5312 5.125
0.6914 2.6178 2500 200.9000 -684.0 -588.0 0.4277 -92.5 -59392.0 -68608.0 5.5625 5.1562
0.6914 2.7225 2600 201.1795 -684.0 -592.0 0.4277 -92.5 -59392.0 -68608.0 5.5938 5.1875
0.6914 2.8272 2700 201.3105 -684.0 -592.0 0.4277 -93.0 -59392.0 -68608.0 5.5938 5.1875
0.6914 2.9319 2800 201.4325 -684.0 -592.0 0.4277 -93.0 -59392.0 -68608.0 5.5625 5.1562

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.3.0
  • Datasets 2.21.0
  • Tokenizers 0.19.1