loubnabnl's picture
loubnabnl HF staff
Duplicate from HuggingFaceTB/smollm2-360M-8k-lc100k-dpo-ultaf-ep2
1717970 verified
|
raw
history blame
4.08 kB
metadata
base_model: loubnabnl/smollm2-360M-8k-lc100k-mix1-ep2
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: smollm2-360M-8k-lc100k-dpo-ultaf-ep2
    results: []

Visualize in Weights & Biases

smollm2-360M-8k-lc100k-dpo-ultaf-ep2

This model is a fine-tuned version of loubnabnl/smollm2-360M-8k-lc100k-mix1-ep2 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6348
  • Rewards/chosen: -0.0342
  • Rewards/rejected: -0.3910
  • Rewards/accuracies: 0.6190
  • Rewards/margins: 0.3568
  • Logps/rejected: -323.7198
  • Logps/chosen: -375.6464
  • Logits/rejected: -1.6969
  • Logits/chosen: -1.6408

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7098 0.2094 100 0.7162 -0.0109 -0.0675 0.5278 0.0566 -323.0727 -375.5997 -1.6983 -1.6387
0.6825 0.4187 200 0.6842 -0.0010 -0.1880 0.5794 0.1870 -323.3139 -375.5800 -1.6938 -1.6358
0.663 0.6281 300 0.6617 0.0225 -0.2389 0.6032 0.2614 -323.4156 -375.5330 -1.6893 -1.6317
0.6547 0.8375 400 0.6591 0.0001 -0.3516 0.6389 0.3517 -323.6410 -375.5778 -1.6980 -1.6414
0.6456 1.0468 500 0.6430 0.0133 -0.3566 0.6667 0.3699 -323.6510 -375.5514 -1.6931 -1.6365
0.6054 1.2562 600 0.6423 -0.0329 -0.3895 0.6349 0.3566 -323.7167 -375.6438 -1.6991 -1.6431
0.6129 1.4656 700 0.6431 -0.0449 -0.4183 0.6349 0.3735 -323.7745 -375.6677 -1.6979 -1.6414
0.5972 1.6750 800 0.6384 -0.0695 -0.4139 0.6429 0.3444 -323.7656 -375.7169 -1.6965 -1.6399
0.6207 1.8843 900 0.6362 -0.0627 -0.4222 0.6786 0.3595 -323.7822 -375.7033 -1.6976 -1.6407

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1