---
library_name: transformers
tags:
- trl
- dpo
- alignment-handbook
- generated_from_trainer
model-index:
- name: OpenELM-1_1B-DPO-full-least-similar
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# OpenELM-1_1B-DPO-full-least-similar

This model was trained from scratch on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.0609
- Rewards/chosen: -3.7969
- Rewards/rejected: -4.0
- Rewards/accuracies: 0.5
- Rewards/margins: 0.2148
- Logps/rejected: -692.0
- Logps/chosen: -700.0
- Logits/rejected: -12.9375
- Logits/chosen: -13.25

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.187         | 0.1047 | 100  | 0.6760          | -0.4492        | -0.5586          | 0.5469             | 0.1084          | -344.0         | -364.0       | -14.5625        | -14.6875      |
| 0.1162        | 0.2094 | 200  | 0.6879          | -0.7734        | -0.875           | 0.5410             | 0.1016          | -376.0         | -396.0       | -11.6875        | -12.0625      |
| 0.1436        | 0.3141 | 300  | 0.7670          | -1.6562        | -1.7969          | 0.4941             | 0.1377          | -468.0         | -484.0       | -13.875         | -14.0625      |
| 0.1461        | 0.4188 | 400  | 0.7442          | -1.0469        | -1.0625          | 0.5039             | 0.0201          | -394.0         | -422.0       | -16.5           | -16.625       |
| 0.1352        | 0.5236 | 500  | 0.8131          | -1.6406        | -1.7031          | 0.5020             | 0.0630          | -460.0         | -482.0       | -15.5625        | -15.6875      |
| 0.1507        | 0.6283 | 600  | 0.8542          | -1.625         | -1.6328          | 0.4766             | 0.0096          | -452.0         | -482.0       | -17.25          | -17.375       |
| 0.1278        | 0.7330 | 700  | 0.8274          | -1.7891        | -1.9453          | 0.4980             | 0.1592          | -484.0         | -496.0       | -14.8125        | -15.0         |
| 0.1303        | 0.8377 | 800  | 0.8349          | -1.7734        | -1.7969          | 0.5195             | 0.0272          | -468.0         | -496.0       | -16.5           | -16.5         |
| 0.1614        | 0.9424 | 900  | 0.8078          | -2.2969        | -2.5             | 0.5332             | 0.1992          | -540.0         | -548.0       | -16.375         | -16.375       |
| 0.0199        | 1.0471 | 1000 | 0.8233          | -2.2656        | -2.3906          | 0.4863             | 0.1279          | -528.0         | -544.0       | -15.4375        | -15.875       |
| 0.0348        | 1.1518 | 1100 | 0.8452          | -2.0469        | -2.1562          | 0.5039             | 0.1187          | -504.0         | -524.0       | -17.0           | -17.125       |
| 0.0186        | 1.2565 | 1200 | 0.8788          | -2.9219        | -3.0312          | 0.5098             | 0.1074          | -592.0         | -612.0       | -14.75          | -15.0625      |
| 0.0277        | 1.3613 | 1300 | 0.8304          | -2.7969        | -2.8906          | 0.5137             | 0.0928          | -576.0         | -600.0       | -14.25          | -14.5         |
| 0.0212        | 1.4660 | 1400 | 0.8990          | -2.7969        | -2.9062          | 0.5                | 0.1099          | -580.0         | -600.0       | -14.25          | -14.4375      |
| 0.0333        | 1.5707 | 1500 | 0.9111          | -3.2031        | -3.2969          | 0.5215             | 0.0981          | -620.0         | -640.0       | -12.1875        | -12.625       |
| 0.0163        | 1.6754 | 1600 | 0.9215          | -3.2188        | -3.3281          | 0.4941             | 0.1104          | -620.0         | -640.0       | -11.0625        | -11.5         |
| 0.0309        | 1.7801 | 1700 | 0.9203          | -2.6719        | -2.7344          | 0.5059             | 0.0635          | -560.0         | -584.0       | -13.5625        | -13.8125      |
| 0.0228        | 1.8848 | 1800 | 0.9032          | -2.8594        | -2.9531          | 0.4941             | 0.0972          | -584.0         | -604.0       | -13.3125        | -13.5625      |
| 0.0116        | 1.9895 | 1900 | 0.9123          | -3.0156        | -3.125           | 0.5                | 0.1187          | -600.0         | -620.0       | -13.375         | -13.625       |
| 0.0011        | 2.0942 | 2000 | 0.9715          | -3.2656        | -3.4531          | 0.4980             | 0.1865          | -636.0         | -644.0       | -13.0625        | -13.3125      |
| 0.0019        | 2.1990 | 2100 | 1.0378          | -3.6719        | -3.9062          | 0.5098             | 0.2393          | -680.0         | -684.0       | -12.5           | -12.8125      |
| 0.0011        | 2.3037 | 2200 | 1.0456          | -3.7188        | -3.9375          | 0.5020             | 0.2227          | -684.0         | -692.0       | -12.8125        | -13.125       |
| 0.0009        | 2.4084 | 2300 | 1.0567          | -3.75          | -3.9688          | 0.5020             | 0.2217          | -684.0         | -692.0       | -12.9375        | -13.25        |
| 0.0022        | 2.5131 | 2400 | 1.0450          | -3.7188        | -3.9062          | 0.4961             | 0.1953          | -680.0         | -692.0       | -13.0           | -13.3125      |
| 0.0013        | 2.6178 | 2500 | 1.0499          | -3.7656        | -3.9688          | 0.5020             | 0.2080          | -684.0         | -696.0       | -12.9375        | -13.25        |
| 0.0006        | 2.7225 | 2600 | 1.0572          | -3.7812        | -3.9844          | 0.4961             | 0.2100          | -688.0         | -696.0       | -12.9375        | -13.25        |
| 0.0007        | 2.8272 | 2700 | 1.0600          | -3.7969        | -4.0             | 0.5020             | 0.2168          | -692.0         | -700.0       | -12.9375        | -13.25        |
| 0.0012        | 2.9319 | 2800 | 1.0609          | -3.7969        | -4.0             | 0.5                | 0.2148          | -692.0         | -700.0       | -12.9375        | -13.25        |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.3.0
- Datasets 3.0.0
- Tokenizers 0.19.1