File size: 14,294 Bytes
1d4985b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
tags:
- generated_from_trainer
datasets:
- arrow
model-index:
- name: PE_Llama_2_7b_sft_rlhf
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# PE_Llama_2_7b_sft_rlhf

This model was trained from scratch on the arrow dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0093
- Rewards/chosen: -7.0331
- Rewards/rejected: -29.3861
- Rewards/accuracies: 0.9916
- Rewards/margins: 22.3530
- Logps/rejected: -118.6765
- Logps/chosen: -90.0482
- Logits/rejected: -1.3495
- Logits/chosen: -1.4301

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 3e-07
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.5577        | 0.05  | 100  | 0.5743          | -0.0890        | -0.3528          | 0.9022             | 0.2638          | -60.6098       | -76.1599     | -1.3076         | -1.3716       |
| 0.1502        | 0.09  | 200  | 0.1761          | -0.5864        | -2.4951          | 0.9804             | 1.9086          | -64.8944       | -77.1548     | -1.3397         | -1.4091       |
| 0.0367        | 0.14  | 300  | 0.0640          | -1.1815        | -4.8466          | 0.9860             | 3.6651          | -69.5975       | -78.3450     | -1.3685         | -1.4428       |
| 0.0195        | 0.18  | 400  | 0.0419          | -1.6306        | -6.4153          | 0.9832             | 4.7847          | -72.7348       | -79.2431     | -1.3875         | -1.4648       |
| 0.0128        | 0.23  | 500  | 0.0321          | -2.1351        | -8.0395          | 0.9860             | 5.9044          | -75.9833       | -80.2522     | -1.4045         | -1.4847       |
| 0.0078        | 0.27  | 600  | 0.0294          | -2.8235        | -9.6992          | 0.9860             | 6.8757          | -79.3027       | -81.6291     | -1.4163         | -1.4986       |
| 0.0074        | 0.32  | 700  | 0.0177          | -2.7718        | -10.7772         | 0.9832             | 8.0054          | -81.4587       | -81.5256     | -1.4251         | -1.5079       |
| 0.0051        | 0.37  | 800  | 0.0144          | -2.4805        | -11.3179         | 0.9832             | 8.8374          | -82.5400       | -80.9429     | -1.4353         | -1.5181       |
| 0.003         | 0.41  | 900  | 0.0160          | -2.8352        | -12.2817         | 0.9860             | 9.4465          | -84.4677       | -81.6525     | -1.4421         | -1.5261       |
| 0.0031        | 0.46  | 1000 | 0.0122          | -2.8873        | -13.0359         | 0.9860             | 10.1487         | -85.9761       | -81.7565     | -1.4514         | -1.5345       |
| 0.0107        | 0.5   | 1100 | 0.0110          | -2.8383        | -13.0784         | 0.9888             | 10.2401         | -86.0611       | -81.6586     | -1.4506         | -1.5334       |
| 0.0065        | 0.55  | 1200 | 0.0130          | -3.3682        | -13.9857         | 0.9860             | 10.6176         | -87.8757       | -82.7184     | -1.4603         | -1.5441       |
| 0.0054        | 0.59  | 1300 | 0.0123          | -3.6048        | -14.8999         | 0.9888             | 11.2951         | -89.7041       | -83.1916     | -1.4576         | -1.5403       |
| 0.0048        | 0.64  | 1400 | 0.0091          | -3.3176        | -15.0505         | 0.9860             | 11.7329         | -90.0053       | -82.6172     | -1.4598         | -1.5418       |
| 0.0017        | 0.68  | 1500 | 0.0087          | -3.3081        | -15.5642         | 0.9860             | 12.2561         | -91.0327       | -82.5982     | -1.4671         | -1.5494       |
| 0.0042        | 0.73  | 1600 | 0.0091          | -3.5315        | -16.2814         | 0.9860             | 12.7498         | -92.4670       | -83.0451     | -1.4722         | -1.5560       |
| 0.0035        | 0.78  | 1700 | 0.0078          | -3.1483        | -15.9040         | 0.9916             | 12.7557         | -91.7122       | -82.2786     | -1.4664         | -1.5481       |
| 0.0094        | 0.82  | 1800 | 0.0071          | -2.9923        | -15.9175         | 0.9888             | 12.9251         | -91.7391       | -81.9667     | -1.4572         | -1.5390       |
| 0.0024        | 0.87  | 1900 | 0.0066          | -2.9861        | -16.5288         | 0.9916             | 13.5427         | -92.9619       | -81.9542     | -1.4690         | -1.5511       |
| 0.0067        | 0.91  | 2000 | 0.0076          | -3.2851        | -16.0301         | 0.9916             | 12.7450         | -91.9644       | -82.5522     | -1.4577         | -1.5391       |
| 0.0044        | 0.96  | 2100 | 0.0064          | -3.3414        | -16.8752         | 0.9944             | 13.5338         | -93.6545       | -82.6647     | -1.4617         | -1.5440       |
| 0.0025        | 1.0   | 2200 | 0.0060          | -3.1967        | -16.8252         | 0.9944             | 13.6285         | -93.5546       | -82.3753     | -1.4630         | -1.5444       |
| 0.0023        | 1.05  | 2300 | 0.0063          | -3.5595        | -17.6105         | 0.9916             | 14.0510         | -95.1253       | -83.1011     | -1.4645         | -1.5467       |
| 0.0055        | 1.1   | 2400 | 0.0070          | -4.0460        | -18.6662         | 0.9944             | 14.6201         | -97.2365       | -84.0740     | -1.4606         | -1.5441       |
| 0.0052        | 1.14  | 2500 | 0.0067          | -3.3185        | -17.6030         | 0.9944             | 14.2844         | -95.1102       | -82.6191     | -1.4679         | -1.5507       |
| 0.0023        | 1.19  | 2600 | 0.0064          | -3.4071        | -18.2406         | 0.9944             | 14.8335         | -96.3854       | -82.7962     | -1.4667         | -1.5501       |
| 0.0044        | 1.23  | 2700 | 0.0090          | -4.3343        | -19.6985         | 0.9916             | 15.3642         | -99.3012       | -84.6506     | -1.4647         | -1.5496       |
| 0.0033        | 1.28  | 2800 | 0.0113          | -4.6406        | -19.7381         | 0.9916             | 15.0976         | -99.3805       | -85.2631     | -1.4569         | -1.5408       |
| 0.0023        | 1.32  | 2900 | 0.0070          | -3.9341        | -19.4138         | 0.9944             | 15.4797         | -98.7318       | -83.8501     | -1.4612         | -1.5449       |
| 0.0034        | 1.37  | 3000 | 0.0066          | -3.7082        | -18.5209         | 0.9916             | 14.8127         | -96.9460       | -83.3983     | -1.4587         | -1.5399       |
| 0.0033        | 1.42  | 3100 | 0.0064          | -3.6694        | -18.6338         | 0.9972             | 14.9644         | -97.1717       | -83.3208     | -1.4480         | -1.5297       |
| 0.0034        | 1.46  | 3200 | 0.0059          | -3.7376        | -19.1673         | 0.9944             | 15.4298         | -98.2389       | -83.4571     | -1.4483         | -1.5307       |
| 0.0019        | 1.51  | 3300 | 0.0061          | -3.9735        | -19.7068         | 0.9916             | 15.7332         | -99.3178       | -83.9291     | -1.4459         | -1.5285       |
| 0.0011        | 1.55  | 3400 | 0.0066          | -4.3242        | -20.4806         | 0.9944             | 16.1564         | -100.8654      | -84.6304     | -1.4412         | -1.5245       |
| 0.0001        | 1.6   | 3500 | 0.0093          | -4.7847        | -21.0204         | 0.9916             | 16.2357         | -101.9450      | -85.5513     | -1.4308         | -1.5145       |
| 0.0037        | 1.64  | 3600 | 0.0076          | -4.5704        | -20.9595         | 0.9888             | 16.3891         | -101.8232      | -85.1228     | -1.4373         | -1.5209       |
| 0.003         | 1.69  | 3700 | 0.0087          | -4.7965        | -21.6522         | 0.9916             | 16.8557         | -103.2086      | -85.5750     | -1.4300         | -1.5148       |
| 0.0056        | 1.73  | 3800 | 0.0093          | -5.1262        | -22.2592         | 0.9916             | 17.1330         | -104.4226      | -86.2344     | -1.4213         | -1.5058       |
| 0.0024        | 1.78  | 3900 | 0.0113          | -5.8601        | -23.7638         | 0.9888             | 17.9037         | -107.4319      | -87.7022     | -1.4014         | -1.4856       |
| 0.0034        | 1.83  | 4000 | 0.0056          | -4.7077        | -22.5264         | 0.9944             | 17.8187         | -104.9570      | -85.3974     | -1.4252         | -1.5084       |
| 0.0044        | 1.87  | 4100 | 0.0055          | -4.2834        | -21.6926         | 0.9972             | 17.4092         | -103.2894      | -84.5488     | -1.4342         | -1.5165       |
| 0.0001        | 1.92  | 4200 | 0.0068          | -5.2542        | -23.4097         | 0.9916             | 18.1555         | -106.7237      | -86.4905     | -1.4219         | -1.5052       |
| 0.0044        | 1.96  | 4300 | 0.0075          | -5.2492        | -23.2824         | 0.9888             | 18.0332         | -106.4690      | -86.4804     | -1.4098         | -1.4921       |
| 0.0022        | 2.01  | 4400 | 0.0082          | -5.6200        | -23.9342         | 0.9944             | 18.3142         | -107.7725      | -87.2220     | -1.4087         | -1.4906       |
| 0.0033        | 2.05  | 4500 | 0.0091          | -5.9484        | -24.5607         | 0.9916             | 18.6123         | -109.0256      | -87.8787     | -1.4036         | -1.4857       |
| 0.0022        | 2.1   | 4600 | 0.0091          | -6.0570        | -25.0424         | 0.9916             | 18.9853         | -109.9890      | -88.0961     | -1.3980         | -1.4804       |
| 0.0011        | 2.15  | 4700 | 0.0100          | -6.3832        | -25.6097         | 0.9888             | 19.2265         | -111.1236      | -88.7484     | -1.3907         | -1.4732       |
| 0.0065        | 2.19  | 4800 | 0.0073          | -5.7898        | -25.1360         | 0.9916             | 19.3462         | -110.1763      | -87.5616     | -1.4006         | -1.4827       |
| 0.0022        | 2.24  | 4900 | 0.0091          | -6.1379        | -25.9334         | 0.9916             | 19.7955         | -111.7710      | -88.2578     | -1.3907         | -1.4732       |
| 0.0022        | 2.28  | 5000 | 0.0147          | -7.3728        | -27.6080         | 0.9888             | 20.2352         | -115.1203      | -90.7277     | -1.3738         | -1.4564       |
| 0.0033        | 2.33  | 5100 | 0.0120          | -6.9056        | -27.3057         | 0.9888             | 20.4002         | -114.5157      | -89.7931     | -1.3780         | -1.4604       |
| 0.0043        | 2.37  | 5200 | 0.0097          | -6.5949        | -27.6154         | 0.9888             | 21.0205         | -115.1350      | -89.1717     | -1.3772         | -1.4593       |
| 0.0022        | 2.42  | 5300 | 0.0152          | -7.5122        | -28.6578         | 0.9888             | 21.1456         | -117.2199      | -91.0065     | -1.3647         | -1.4465       |
| 0.0022        | 2.46  | 5400 | 0.0149          | -7.7072        | -29.4467         | 0.9888             | 21.7395         | -118.7977      | -91.3965     | -1.3515         | -1.4331       |
| 0.0001        | 2.51  | 5500 | 0.0137          | -7.6730        | -29.4473         | 0.9916             | 21.7743         | -118.7989      | -91.3281     | -1.3483         | -1.4293       |
| 0.0022        | 2.56  | 5600 | 0.0133          | -7.6989        | -29.6686         | 0.9916             | 21.9697         | -119.2415      | -91.3798     | -1.3485         | -1.4299       |
| 0.0011        | 2.6   | 5700 | 0.0095          | -6.8592        | -28.9672         | 0.9888             | 22.1080         | -117.8385      | -89.7003     | -1.3553         | -1.4366       |
| 0.0054        | 2.65  | 5800 | 0.0077          | -6.4136        | -28.4244         | 0.9916             | 22.0108         | -116.7531      | -88.8093     | -1.3637         | -1.4450       |
| 0.0033        | 2.69  | 5900 | 0.0115          | -7.6490        | -30.1521         | 0.9888             | 22.5031         | -120.2085      | -91.2800     | -1.3400         | -1.4208       |
| 0.0011        | 2.74  | 6000 | 0.0086          | -6.8537        | -29.1407         | 0.9888             | 22.2870         | -118.1857      | -89.6894     | -1.3510         | -1.4317       |
| 0.0011        | 2.78  | 6100 | 0.0095          | -7.1201        | -29.6324         | 0.9888             | 22.5123         | -119.1690      | -90.2221     | -1.3452         | -1.4257       |
| 0.0022        | 2.83  | 6200 | 0.0086          | -6.8942        | -29.1673         | 0.9916             | 22.2731         | -118.2387      | -89.7703     | -1.3531         | -1.4335       |
| 0.0013        | 2.88  | 6300 | 0.0086          | -6.8366        | -29.0334         | 0.9916             | 22.1968         | -117.9710      | -89.6551     | -1.3543         | -1.4349       |
| 0.0033        | 2.92  | 6400 | 0.0096          | -7.0073        | -29.2913         | 0.9916             | 22.2840         | -118.4869      | -89.9966     | -1.3494         | -1.4303       |
| 0.0011        | 2.97  | 6500 | 0.0092          | -6.9778        | -29.3366         | 0.9916             | 22.3588         | -118.5774      | -89.9376     | -1.3494         | -1.4297       |


### Framework versions

- Transformers 4.35.0
- Pytorch 2.1.1+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1