File size: 13,063 Bytes
7239dad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
428274d
 
 
 
 
 
 
 
 
7239dad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
02e14d0
 
7239dad
 
 
02e14d0
 
7239dad
 
 
 
 
 
 
 
 
428274d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7239dad
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
- generated_from_trainer
model-index:
- name: zephyr-7b-dpo-full
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-full

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co./alignment-handbook/zephyr-7b-sft-full) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.7656
- Rewards/chosen: -3.8106
- Rewards/rejected: -6.8888
- Rewards/accuracies: 0.7405
- Rewards/margins: 3.0782
- Logps/rejected: -327.0232
- Logps/chosen: -316.3568
- Logits/rejected: -2.4373
- Logits/chosen: -2.3640

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6612        | 0.05  | 100  | 0.6640          | 0.0464         | -0.0349          | 0.6565             | 0.0813          | -258.4842      | -277.7868    | -2.8352         | -2.7411       |
| 0.5924        | 0.1   | 200  | 0.6068          | 0.0778         | -0.2525          | 0.6927             | 0.3302          | -260.6598      | -277.4728    | -2.8153         | -2.7265       |
| 0.5488        | 0.15  | 300  | 0.5772          | 0.1688         | -0.4844          | 0.7385             | 0.6531          | -262.9787      | -276.5630    | -2.8364         | -2.7548       |
| 0.5144        | 0.2   | 400  | 0.5635          | 0.0609         | -0.7604          | 0.7347             | 0.8213          | -265.7392      | -277.6411    | -2.7890         | -2.7072       |
| 0.5399        | 0.25  | 500  | 0.5393          | 0.0316         | -0.9906          | 0.75               | 1.0221          | -268.0409      | -277.9347    | -2.8372         | -2.7565       |
| 0.5776        | 0.31  | 600  | 0.5706          | 0.0425         | -0.9799          | 0.7424             | 1.0224          | -267.9345      | -277.8257    | -2.8388         | -2.7569       |
| 0.5834        | 0.36  | 700  | 0.5596          | 0.0454         | -1.0216          | 0.7424             | 1.0670          | -268.3513      | -277.7964    | -2.7830         | -2.6941       |
| 0.5394        | 0.41  | 800  | 0.5358          | 0.0804         | -0.9536          | 0.7481             | 1.0341          | -267.6714      | -277.4460    | -2.6313         | -2.5480       |
| 0.5141        | 0.46  | 900  | 0.5412          | -0.2704        | -1.4309          | 0.7443             | 1.1605          | -272.4444      | -280.9546    | -2.6662         | -2.5832       |
| 0.51          | 0.51  | 1000 | 0.5350          | -0.2070        | -1.4043          | 0.7366             | 1.1973          | -272.1781      | -280.3206    | -2.7118         | -2.6217       |
| 0.5219        | 0.56  | 1100 | 0.5405          | -0.1673        | -1.3152          | 0.7290             | 1.1479          | -271.2871      | -279.9233    | -2.7451         | -2.6605       |
| 0.5391        | 0.61  | 1200 | 0.5320          | -0.2460        | -1.4452          | 0.7405             | 1.1992          | -272.5871      | -280.7106    | -2.7552         | -2.6692       |
| 0.536         | 0.66  | 1300 | 0.5502          | -0.1919        | -1.3564          | 0.7271             | 1.1645          | -271.6995      | -280.1697    | -2.7006         | -2.6126       |
| 0.6544        | 0.71  | 1400 | 0.5309          | -0.3757        | -1.6757          | 0.7080             | 1.3000          | -274.8926      | -282.0077    | -2.6970         | -2.6046       |
| 0.5697        | 0.76  | 1500 | 0.5662          | -0.2493        | -1.4791          | 0.7156             | 1.2297          | -272.9258      | -280.7440    | -2.7656         | -2.6730       |
| 0.5538        | 0.81  | 1600 | 0.5326          | -0.4658        | -1.6791          | 0.7214             | 1.2134          | -274.9264      | -282.9080    | -2.6934         | -2.5946       |
| 0.551         | 0.86  | 1700 | 0.5258          | -0.6217        | -1.8893          | 0.7137             | 1.2676          | -277.0278      | -284.4673    | -2.6535         | -2.5567       |
| 0.5708        | 0.92  | 1800 | 0.5639          | -0.5168        | -1.8962          | 0.7214             | 1.3794          | -277.0974      | -283.4186    | -2.6279         | -2.5564       |
| 0.5344        | 0.97  | 1900 | 0.5603          | -0.3788        | -1.8158          | 0.7271             | 1.4370          | -276.2931      | -282.0388    | -2.6680         | -2.5998       |
| 0.0925        | 1.02  | 2000 | 0.5587          | -0.4628        | -2.1277          | 0.7405             | 1.6648          | -279.4120      | -282.8788    | -2.6520         | -2.5825       |
| 0.112         | 1.07  | 2100 | 0.5731          | -0.6788        | -2.5908          | 0.7481             | 1.9120          | -284.0433      | -285.0383    | -2.5722         | -2.5094       |
| 0.0539        | 1.12  | 2200 | 0.5869          | -1.0820        | -2.9310          | 0.7366             | 1.8489          | -287.4448      | -289.0707    | -2.5937         | -2.5303       |
| 0.0811        | 1.17  | 2300 | 0.6306          | -0.8332        | -2.7204          | 0.7424             | 1.8872          | -285.3392      | -286.5822    | -2.5137         | -2.4560       |
| 0.0877        | 1.22  | 2400 | 0.5963          | -1.3075        | -3.3622          | 0.7481             | 2.0548          | -291.7576      | -291.3254    | -2.5925         | -2.5291       |
| 0.1114        | 1.27  | 2500 | 0.6126          | -1.3609        | -3.5524          | 0.7462             | 2.1915          | -293.6587      | -291.8594    | -2.4792         | -2.4142       |
| 0.0864        | 1.32  | 2600 | 0.6457          | -1.6093        | -3.7584          | 0.75               | 2.1491          | -295.7195      | -294.3440    | -2.5710         | -2.5058       |
| 0.0708        | 1.37  | 2700 | 0.6080          | -1.8094        | -3.7042          | 0.7462             | 1.8948          | -295.1769      | -296.3445    | -2.5394         | -2.4684       |
| 0.0794        | 1.42  | 2800 | 0.6010          | -1.7685        | -3.8603          | 0.7538             | 2.0918          | -296.7380      | -295.9354    | -2.5369         | -2.4663       |
| 0.1009        | 1.48  | 2900 | 0.6102          | -1.6050        | -3.5962          | 0.7347             | 1.9912          | -294.0973      | -294.3007    | -2.4834         | -2.4073       |
| 0.083         | 1.53  | 3000 | 0.6125          | -1.6395        | -3.6683          | 0.7424             | 2.0288          | -294.8184      | -294.6455    | -2.5306         | -2.4521       |
| 0.0871        | 1.58  | 3100 | 0.6392          | -1.7447        | -3.8250          | 0.75               | 2.0802          | -296.3850      | -295.6979    | -2.5032         | -2.4279       |
| 0.1168        | 1.63  | 3200 | 0.5973          | -1.6226        | -3.5602          | 0.7443             | 1.9376          | -293.7374      | -294.4764    | -2.5372         | -2.4606       |
| 0.0699        | 1.68  | 3300 | 0.5816          | -1.6383        | -3.5364          | 0.7424             | 1.8982          | -293.4994      | -294.6331    | -2.5287         | -2.4527       |
| 0.1082        | 1.73  | 3400 | 0.5895          | -1.8055        | -3.7976          | 0.7424             | 1.9920          | -296.1109      | -296.3059    | -2.5178         | -2.4442       |
| 0.09          | 1.78  | 3500 | 0.6231          | -1.8455        | -4.0234          | 0.75               | 2.1779          | -298.3694      | -296.7055    | -2.5261         | -2.4561       |
| 0.1238        | 1.83  | 3600 | 0.6047          | -1.6771        | -3.5997          | 0.7424             | 1.9226          | -294.1321      | -295.0213    | -2.6294         | -2.5512       |
| 0.0847        | 1.88  | 3700 | 0.5898          | -1.6725        | -3.5743          | 0.7347             | 1.9018          | -293.8779      | -294.9758    | -2.6224         | -2.5471       |
| 0.0908        | 1.93  | 3800 | 0.5817          | -1.6076        | -3.5381          | 0.7366             | 1.9304          | -293.5158      | -294.3269    | -2.5778         | -2.5047       |
| 0.0666        | 1.98  | 3900 | 0.6063          | -1.6950        | -3.7437          | 0.7309             | 2.0487          | -295.5718      | -295.2004    | -2.5784         | -2.5061       |
| 0.0173        | 2.03  | 4000 | 0.6213          | -2.1227        | -4.3451          | 0.7309             | 2.2224          | -301.5862      | -299.4778    | -2.6197         | -2.5495       |
| 0.0213        | 2.09  | 4100 | 0.6529          | -2.4461        | -4.9221          | 0.7366             | 2.4759          | -307.3557      | -302.7117    | -2.6029         | -2.5335       |
| 0.0149        | 2.14  | 4200 | 0.6934          | -3.0653        | -5.7847          | 0.7347             | 2.7194          | -315.9821      | -308.9039    | -2.5938         | -2.5272       |
| 0.0084        | 2.19  | 4300 | 0.7083          | -3.1845        | -6.0188          | 0.7405             | 2.8343          | -318.3230      | -310.0955    | -2.5088         | -2.4404       |
| 0.0059        | 2.24  | 4400 | 0.7193          | -3.3983        | -6.2807          | 0.7405             | 2.8824          | -320.9418      | -312.2334    | -2.5109         | -2.4479       |
| 0.0116        | 2.29  | 4500 | 0.7128          | -3.3425        | -6.1944          | 0.7462             | 2.8519          | -320.0795      | -311.6758    | -2.4787         | -2.4132       |
| 0.0077        | 2.34  | 4600 | 0.7219          | -3.2306        | -6.1475          | 0.7481             | 2.9169          | -319.6102      | -310.5562    | -2.4449         | -2.3779       |
| 0.0177        | 2.39  | 4700 | 0.7451          | -3.5469        | -6.5210          | 0.75               | 2.9742          | -323.3456      | -313.7194    | -2.3861         | -2.3174       |
| 0.0112        | 2.44  | 4800 | 0.7547          | -3.4801        | -6.4397          | 0.7424             | 2.9595          | -322.5316      | -313.0519    | -2.3939         | -2.3242       |
| 0.0071        | 2.49  | 4900 | 0.7691          | -3.8596        | -6.8490          | 0.7443             | 2.9895          | -326.6253      | -316.8460    | -2.3524         | -2.2834       |
| 0.0118        | 2.54  | 5000 | 0.7717          | -3.8862        | -6.8731          | 0.7462             | 2.9868          | -326.8659      | -317.1129    | -2.3347         | -2.2658       |
| 0.014         | 2.59  | 5100 | 0.7685          | -3.5970        | -6.5998          | 0.7481             | 3.0028          | -324.1335      | -314.2205    | -2.3512         | -2.2783       |
| 0.0208        | 2.64  | 5200 | 0.7741          | -3.9029        | -6.8895          | 0.7443             | 2.9866          | -327.0299      | -317.2794    | -2.3875         | -2.3143       |
| 0.0076        | 2.7   | 5300 | 0.7600          | -3.6159        | -6.5800          | 0.7424             | 2.9641          | -323.9353      | -314.4092    | -2.4331         | -2.3592       |
| 0.0146        | 2.75  | 5400 | 0.7768          | -3.7657        | -6.8555          | 0.7424             | 3.0898          | -326.6905      | -315.9074    | -2.4475         | -2.3751       |
| 0.0161        | 2.8   | 5500 | 0.7902          | -3.9170        | -7.0635          | 0.7481             | 3.1465          | -328.7701      | -317.4208    | -2.4332         | -2.3620       |
| 0.0056        | 2.85  | 5600 | 0.7827          | -3.9513        | -7.0687          | 0.7424             | 3.1174          | -328.8217      | -317.7632    | -2.4313         | -2.3599       |
| 0.0083        | 2.9   | 5700 | 0.7741          | -3.8805        | -6.9708          | 0.7443             | 3.0903          | -327.8432      | -317.0560    | -2.4324         | -2.3598       |
| 0.0243        | 2.95  | 5800 | 0.7657          | -3.8176        | -6.8913          | 0.7405             | 3.0737          | -327.0486      | -316.4268    | -2.4355         | -2.3620       |


### Framework versions

- Transformers 4.35.0
- Pytorch 2.1.1+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1