silviasapora commited on
Commit
698477f
·
verified ·
1 Parent(s): 7b6d92a

Model save

Browse files
Files changed (4) hide show
  1. README.md +67 -0
  2. all_results.json +9 -0
  3. train_results.json +9 -0
  4. trainer_state.json +1176 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: google/gemma-7b
3
+ library_name: transformers
4
+ model_name: gemma-7b-borpo-noisy-5e-5-norm
5
+ tags:
6
+ - generated_from_trainer
7
+ - trl
8
+ - orpo
9
+ licence: license
10
+ ---
11
+
12
+ # Model Card for gemma-7b-borpo-noisy-5e-5-norm
13
+
14
+ This model is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b).
15
+ It has been trained using [TRL](https://github.com/huggingface/trl).
16
+
17
+ ## Quick start
18
+
19
+ ```python
20
+ from transformers import pipeline
21
+
22
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="silviasapora/gemma-7b-borpo-noisy-5e-5-norm", device="cuda")
24
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
+ print(output["generated_text"])
26
+ ```
27
+
28
+ ## Training procedure
29
+
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/silvias/huggingface/runs/w6fra4fm)
31
+
32
+
33
+ This model was trained with ORPO, a method introduced in [ORPO: Monolithic Preference Optimization without Reference Model](https://huggingface.co/papers/2403.07691).
34
+
35
+ ### Framework versions
36
+
37
+ - TRL: 0.13.0
38
+ - Transformers: 4.46.1
39
+ - Pytorch: 2.4.0
40
+ - Datasets: 3.1.0
41
+ - Tokenizers: 0.20.1
42
+
43
+ ## Citations
44
+
45
+ Cite ORPO as:
46
+
47
+ ```bibtex
48
+ @article{hong2024orpo,
49
+ title = {{ORPO: Monolithic Preference Optimization without Reference Model}},
50
+ author = {Jiwoo Hong and Noah Lee and James Thorne},
51
+ year = 2024,
52
+ eprint = {arXiv:2403.07691}
53
+ }
54
+ ```
55
+
56
+ Cite TRL as:
57
+
58
+ ```bibtex
59
+ @misc{vonwerra2022trl,
60
+ title = {{TRL: Transformer Reinforcement Learning}},
61
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
62
+ year = 2020,
63
+ journal = {GitHub repository},
64
+ publisher = {GitHub},
65
+ howpublished = {\url{https://github.com/huggingface/trl}}
66
+ }
67
+ ```
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.986666666666667,
3
+ "total_flos": 0.0,
4
+ "train_loss": 60.72040763733879,
5
+ "train_runtime": 6827.5776,
6
+ "train_samples": 6750,
7
+ "train_samples_per_second": 2.966,
8
+ "train_steps_per_second": 0.046
9
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.986666666666667,
3
+ "total_flos": 0.0,
4
+ "train_loss": 60.72040763733879,
5
+ "train_runtime": 6827.5776,
6
+ "train_samples": 6750,
7
+ "train_samples_per_second": 2.966,
8
+ "train_steps_per_second": 0.046
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.986666666666667,
5
+ "eval_steps": 500,
6
+ "global_step": 315,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.047407407407407405,
13
+ "grad_norm": 736.0,
14
+ "learning_rate": 7.8125e-06,
15
+ "log_odds_chosen": 4.987946510314941,
16
+ "log_odds_ratio": -9.761848449707031,
17
+ "logits/chosen": 138.64175415039062,
18
+ "logits/rejected": 152.19424438476562,
19
+ "logps/chosen": -20.546340942382812,
20
+ "logps/rejected": -25.53423309326172,
21
+ "loss": 392.2693,
22
+ "nll_loss": 8.064610481262207,
23
+ "rewards/accuracies": 0.543749988079071,
24
+ "rewards/chosen": -10.273170471191406,
25
+ "rewards/margins": 2.493946075439453,
26
+ "rewards/rejected": -12.76711654663086,
27
+ "step": 5
28
+ },
29
+ {
30
+ "epoch": 0.09481481481481481,
31
+ "grad_norm": 478.0,
32
+ "learning_rate": 1.5625e-05,
33
+ "log_odds_chosen": 2.351091146469116,
34
+ "log_odds_ratio": -7.1593170166015625,
35
+ "logits/chosen": 130.83258056640625,
36
+ "logits/rejected": 159.64907836914062,
37
+ "logps/chosen": -15.241386413574219,
38
+ "logps/rejected": -17.591449737548828,
39
+ "loss": 350.4958,
40
+ "nll_loss": 6.879847526550293,
41
+ "rewards/accuracies": 0.5375000238418579,
42
+ "rewards/chosen": -7.620693206787109,
43
+ "rewards/margins": 1.175031065940857,
44
+ "rewards/rejected": -8.795724868774414,
45
+ "step": 10
46
+ },
47
+ {
48
+ "epoch": 0.14222222222222222,
49
+ "grad_norm": 498.0,
50
+ "learning_rate": 2.34375e-05,
51
+ "log_odds_chosen": 4.327846527099609,
52
+ "log_odds_ratio": -7.479238986968994,
53
+ "logits/chosen": 120.69163513183594,
54
+ "logits/rejected": 148.0057373046875,
55
+ "logps/chosen": -19.320337295532227,
56
+ "logps/rejected": -23.6463623046875,
57
+ "loss": 346.1611,
58
+ "nll_loss": 7.978721618652344,
59
+ "rewards/accuracies": 0.574999988079071,
60
+ "rewards/chosen": -9.660168647766113,
61
+ "rewards/margins": 2.163012981414795,
62
+ "rewards/rejected": -11.82318115234375,
63
+ "step": 15
64
+ },
65
+ {
66
+ "epoch": 0.18962962962962962,
67
+ "grad_norm": 1896.0,
68
+ "learning_rate": 3.125e-05,
69
+ "log_odds_chosen": -0.3415478467941284,
70
+ "log_odds_ratio": -5.298593997955322,
71
+ "logits/chosen": 143.34091186523438,
72
+ "logits/rejected": 147.2091064453125,
73
+ "logps/chosen": -12.515907287597656,
74
+ "logps/rejected": -12.175848007202148,
75
+ "loss": 266.0229,
76
+ "nll_loss": 5.948061943054199,
77
+ "rewards/accuracies": 0.550000011920929,
78
+ "rewards/chosen": -6.257953643798828,
79
+ "rewards/margins": -0.1700296252965927,
80
+ "rewards/rejected": -6.087924003601074,
81
+ "step": 20
82
+ },
83
+ {
84
+ "epoch": 0.23703703703703705,
85
+ "grad_norm": 324.0,
86
+ "learning_rate": 3.90625e-05,
87
+ "log_odds_chosen": 0.98066645860672,
88
+ "log_odds_ratio": -1.3506591320037842,
89
+ "logits/chosen": 161.93701171875,
90
+ "logits/rejected": 170.9370880126953,
91
+ "logps/chosen": -3.294827938079834,
92
+ "logps/rejected": -4.26317024230957,
93
+ "loss": 92.5719,
94
+ "nll_loss": 2.4116456508636475,
95
+ "rewards/accuracies": 0.5249999761581421,
96
+ "rewards/chosen": -1.647413969039917,
97
+ "rewards/margins": 0.4841710925102234,
98
+ "rewards/rejected": -2.131585121154785,
99
+ "step": 25
100
+ },
101
+ {
102
+ "epoch": 0.28444444444444444,
103
+ "grad_norm": 532.0,
104
+ "learning_rate": 4.6875e-05,
105
+ "log_odds_chosen": 0.14513510465621948,
106
+ "log_odds_ratio": -0.7996392250061035,
107
+ "logits/chosen": 191.52481079101562,
108
+ "logits/rejected": 213.43954467773438,
109
+ "logps/chosen": -1.6295182704925537,
110
+ "logps/rejected": -1.75119149684906,
111
+ "loss": 71.4959,
112
+ "nll_loss": 1.912672758102417,
113
+ "rewards/accuracies": 0.574999988079071,
114
+ "rewards/chosen": -0.8147591352462769,
115
+ "rewards/margins": 0.06083657220005989,
116
+ "rewards/rejected": -0.87559574842453,
117
+ "step": 30
118
+ },
119
+ {
120
+ "epoch": 0.33185185185185184,
121
+ "grad_norm": 149.0,
122
+ "learning_rate": 4.998613757348784e-05,
123
+ "log_odds_chosen": -0.0024008960463106632,
124
+ "log_odds_ratio": -0.854382336139679,
125
+ "logits/chosen": 231.1159210205078,
126
+ "logits/rejected": 226.5718994140625,
127
+ "logps/chosen": -1.568902850151062,
128
+ "logps/rejected": -1.5733020305633545,
129
+ "loss": 65.1785,
130
+ "nll_loss": 1.7701244354248047,
131
+ "rewards/accuracies": 0.5249999761581421,
132
+ "rewards/chosen": -0.784451425075531,
133
+ "rewards/margins": 0.002199609996750951,
134
+ "rewards/rejected": -0.7866510152816772,
135
+ "step": 35
136
+ },
137
+ {
138
+ "epoch": 0.37925925925925924,
139
+ "grad_norm": 110.0,
140
+ "learning_rate": 4.990147841143462e-05,
141
+ "log_odds_chosen": 0.23734720051288605,
142
+ "log_odds_ratio": -0.6953937411308289,
143
+ "logits/chosen": 233.0690460205078,
144
+ "logits/rejected": 235.18359375,
145
+ "logps/chosen": -1.3615589141845703,
146
+ "logps/rejected": -1.551184892654419,
147
+ "loss": 61.3129,
148
+ "nll_loss": 1.6592628955841064,
149
+ "rewards/accuracies": 0.5062500238418579,
150
+ "rewards/chosen": -0.6807794570922852,
151
+ "rewards/margins": 0.09481293708086014,
152
+ "rewards/rejected": -0.7755924463272095,
153
+ "step": 40
154
+ },
155
+ {
156
+ "epoch": 0.4266666666666667,
157
+ "grad_norm": 168.0,
158
+ "learning_rate": 4.97401218720448e-05,
159
+ "log_odds_chosen": 0.17986582219600677,
160
+ "log_odds_ratio": -0.7720650434494019,
161
+ "logits/chosen": 214.67892456054688,
162
+ "logits/rejected": 212.75643920898438,
163
+ "logps/chosen": -1.3822438716888428,
164
+ "logps/rejected": -1.5297141075134277,
165
+ "loss": 59.3438,
166
+ "nll_loss": 1.6401255130767822,
167
+ "rewards/accuracies": 0.5249999761581421,
168
+ "rewards/chosen": -0.6911219358444214,
169
+ "rewards/margins": 0.07373511791229248,
170
+ "rewards/rejected": -0.7648570537567139,
171
+ "step": 45
172
+ },
173
+ {
174
+ "epoch": 0.4740740740740741,
175
+ "grad_norm": 102.5,
176
+ "learning_rate": 4.9502564938797946e-05,
177
+ "log_odds_chosen": 0.18993845582008362,
178
+ "log_odds_ratio": -0.7323363423347473,
179
+ "logits/chosen": 208.16702270507812,
180
+ "logits/rejected": 208.33804321289062,
181
+ "logps/chosen": -1.3402252197265625,
182
+ "logps/rejected": -1.4932218790054321,
183
+ "loss": 60.8339,
184
+ "nll_loss": 1.674740195274353,
185
+ "rewards/accuracies": 0.574999988079071,
186
+ "rewards/chosen": -0.6701126098632812,
187
+ "rewards/margins": 0.07649824768304825,
188
+ "rewards/rejected": -0.7466109395027161,
189
+ "step": 50
190
+ },
191
+ {
192
+ "epoch": 0.5214814814814814,
193
+ "grad_norm": 69.5,
194
+ "learning_rate": 4.918953929490768e-05,
195
+ "log_odds_chosen": 0.1412346512079239,
196
+ "log_odds_ratio": -0.7170370221138,
197
+ "logits/chosen": 207.9961395263672,
198
+ "logits/rejected": 205.17996215820312,
199
+ "logps/chosen": -1.2495604753494263,
200
+ "logps/rejected": -1.3624210357666016,
201
+ "loss": 55.5208,
202
+ "nll_loss": 1.5490686893463135,
203
+ "rewards/accuracies": 0.5562499761581421,
204
+ "rewards/chosen": -0.6247802376747131,
205
+ "rewards/margins": 0.05643026903271675,
206
+ "rewards/rejected": -0.6812105178833008,
207
+ "step": 55
208
+ },
209
+ {
210
+ "epoch": 0.5688888888888889,
211
+ "grad_norm": 126.0,
212
+ "learning_rate": 4.88020090697132e-05,
213
+ "log_odds_chosen": 0.21612174808979034,
214
+ "log_odds_ratio": -0.6790373921394348,
215
+ "logits/chosen": 208.6972198486328,
216
+ "logits/rejected": 204.21548461914062,
217
+ "logps/chosen": -1.1511867046356201,
218
+ "logps/rejected": -1.3068695068359375,
219
+ "loss": 53.3904,
220
+ "nll_loss": 1.4371321201324463,
221
+ "rewards/accuracies": 0.5625,
222
+ "rewards/chosen": -0.5755933523178101,
223
+ "rewards/margins": 0.07784143090248108,
224
+ "rewards/rejected": -0.6534347534179688,
225
+ "step": 60
226
+ },
227
+ {
228
+ "epoch": 0.6162962962962963,
229
+ "grad_norm": 116.5,
230
+ "learning_rate": 4.834116786912897e-05,
231
+ "log_odds_chosen": 0.20034953951835632,
232
+ "log_odds_ratio": -0.7185416221618652,
233
+ "logits/chosen": 211.181884765625,
234
+ "logits/rejected": 212.2796173095703,
235
+ "logps/chosen": -1.2082265615463257,
236
+ "logps/rejected": -1.3656466007232666,
237
+ "loss": 52.9814,
238
+ "nll_loss": 1.4586594104766846,
239
+ "rewards/accuracies": 0.5249999761581421,
240
+ "rewards/chosen": -0.6041132807731628,
241
+ "rewards/margins": 0.07870997488498688,
242
+ "rewards/rejected": -0.6828233003616333,
243
+ "step": 65
244
+ },
245
+ {
246
+ "epoch": 0.6637037037037037,
247
+ "grad_norm": 102.5,
248
+ "learning_rate": 4.7808435099299045e-05,
249
+ "log_odds_chosen": 0.2931309938430786,
250
+ "log_odds_ratio": -0.6710721254348755,
251
+ "logits/chosen": 210.8258514404297,
252
+ "logits/rejected": 204.9437713623047,
253
+ "logps/chosen": -1.1364883184432983,
254
+ "logps/rejected": -1.3803962469100952,
255
+ "loss": 51.9763,
256
+ "nll_loss": 1.429614543914795,
257
+ "rewards/accuracies": 0.581250011920929,
258
+ "rewards/chosen": -0.5682441592216492,
259
+ "rewards/margins": 0.12195394933223724,
260
+ "rewards/rejected": -0.6901981234550476,
261
+ "step": 70
262
+ },
263
+ {
264
+ "epoch": 0.7111111111111111,
265
+ "grad_norm": 73.5,
266
+ "learning_rate": 4.720545159477922e-05,
267
+ "log_odds_chosen": 0.30529800057411194,
268
+ "log_odds_ratio": -0.6410656571388245,
269
+ "logits/chosen": 210.42739868164062,
270
+ "logits/rejected": 212.0102081298828,
271
+ "logps/chosen": -1.080447793006897,
272
+ "logps/rejected": -1.3063446283340454,
273
+ "loss": 51.2694,
274
+ "nll_loss": 1.4210776090621948,
275
+ "rewards/accuracies": 0.637499988079071,
276
+ "rewards/chosen": -0.5402238965034485,
277
+ "rewards/margins": 0.11294851452112198,
278
+ "rewards/rejected": -0.6531723141670227,
279
+ "step": 75
280
+ },
281
+ {
282
+ "epoch": 0.7585185185185185,
283
+ "grad_norm": 109.5,
284
+ "learning_rate": 4.653407456471222e-05,
285
+ "log_odds_chosen": 0.2542671263217926,
286
+ "log_odds_ratio": -0.6951149106025696,
287
+ "logits/chosen": 214.9987030029297,
288
+ "logits/rejected": 210.99972534179688,
289
+ "logps/chosen": -1.1126171350479126,
290
+ "logps/rejected": -1.2992708683013916,
291
+ "loss": 49.9945,
292
+ "nll_loss": 1.4084731340408325,
293
+ "rewards/accuracies": 0.6000000238418579,
294
+ "rewards/chosen": -0.5563085675239563,
295
+ "rewards/margins": 0.09332697093486786,
296
+ "rewards/rejected": -0.6496354341506958,
297
+ "step": 80
298
+ },
299
+ {
300
+ "epoch": 0.8059259259259259,
301
+ "grad_norm": 92.5,
302
+ "learning_rate": 4.579637187256222e-05,
303
+ "log_odds_chosen": 0.2188234031200409,
304
+ "log_odds_ratio": -0.6811344027519226,
305
+ "logits/chosen": 210.42398071289062,
306
+ "logits/rejected": 202.4885711669922,
307
+ "logps/chosen": -1.0539259910583496,
308
+ "logps/rejected": -1.2093901634216309,
309
+ "loss": 49.2791,
310
+ "nll_loss": 1.3479670286178589,
311
+ "rewards/accuracies": 0.6000000238418579,
312
+ "rewards/chosen": -0.5269629955291748,
313
+ "rewards/margins": 0.07773206382989883,
314
+ "rewards/rejected": -0.6046950817108154,
315
+ "step": 85
316
+ },
317
+ {
318
+ "epoch": 0.8533333333333334,
319
+ "grad_norm": 68.5,
320
+ "learning_rate": 4.499461566702685e-05,
321
+ "log_odds_chosen": 0.2878134846687317,
322
+ "log_odds_ratio": -0.6503546833992004,
323
+ "logits/chosen": 206.28036499023438,
324
+ "logits/rejected": 204.7410125732422,
325
+ "logps/chosen": -1.037368893623352,
326
+ "logps/rejected": -1.231547474861145,
327
+ "loss": 52.0954,
328
+ "nll_loss": 1.4560641050338745,
329
+ "rewards/accuracies": 0.643750011920929,
330
+ "rewards/chosen": -0.518684446811676,
331
+ "rewards/margins": 0.09708929806947708,
332
+ "rewards/rejected": -0.6157737374305725,
333
+ "step": 90
334
+ },
335
+ {
336
+ "epoch": 0.9007407407407407,
337
+ "grad_norm": 79.0,
338
+ "learning_rate": 4.413127538374411e-05,
339
+ "log_odds_chosen": 0.3599183261394501,
340
+ "log_odds_ratio": -0.6029065847396851,
341
+ "logits/chosen": 201.10952758789062,
342
+ "logits/rejected": 197.03341674804688,
343
+ "logps/chosen": -0.9574621319770813,
344
+ "logps/rejected": -1.1941089630126953,
345
+ "loss": 48.7124,
346
+ "nll_loss": 1.2800534963607788,
347
+ "rewards/accuracies": 0.668749988079071,
348
+ "rewards/chosen": -0.47873106598854065,
349
+ "rewards/margins": 0.1183234453201294,
350
+ "rewards/rejected": -0.5970544815063477,
351
+ "step": 95
352
+ },
353
+ {
354
+ "epoch": 0.9481481481481482,
355
+ "grad_norm": 107.5,
356
+ "learning_rate": 4.320901013934887e-05,
357
+ "log_odds_chosen": 0.1424800455570221,
358
+ "log_odds_ratio": -0.7256409525871277,
359
+ "logits/chosen": 203.09567260742188,
360
+ "logits/rejected": 196.48220825195312,
361
+ "logps/chosen": -1.0806185007095337,
362
+ "logps/rejected": -1.2011009454727173,
363
+ "loss": 49.3262,
364
+ "nll_loss": 1.3956550359725952,
365
+ "rewards/accuracies": 0.5562499761581421,
366
+ "rewards/chosen": -0.5403092503547668,
367
+ "rewards/margins": 0.0602412223815918,
368
+ "rewards/rejected": -0.6005504727363586,
369
+ "step": 100
370
+ },
371
+ {
372
+ "epoch": 0.9955555555555555,
373
+ "grad_norm": 101.0,
374
+ "learning_rate": 4.223066054130568e-05,
375
+ "log_odds_chosen": 0.29610657691955566,
376
+ "log_odds_ratio": -0.6316549181938171,
377
+ "logits/chosen": 204.3717041015625,
378
+ "logits/rejected": 200.5919189453125,
379
+ "logps/chosen": -1.006296992301941,
380
+ "logps/rejected": -1.2125999927520752,
381
+ "loss": 48.1369,
382
+ "nll_loss": 1.312403678894043,
383
+ "rewards/accuracies": 0.637499988079071,
384
+ "rewards/chosen": -0.5031484961509705,
385
+ "rewards/margins": 0.10315157473087311,
386
+ "rewards/rejected": -0.6062999963760376,
387
+ "step": 105
388
+ },
389
+ {
390
+ "epoch": 1.0429629629629629,
391
+ "grad_norm": 93.0,
392
+ "learning_rate": 4.1199239938743797e-05,
393
+ "log_odds_chosen": 0.5429099798202515,
394
+ "log_odds_ratio": -0.5789119005203247,
395
+ "logits/chosen": 202.58053588867188,
396
+ "logits/rejected": 195.58265686035156,
397
+ "logps/chosen": -0.8625534772872925,
398
+ "logps/rejected": -1.2194641828536987,
399
+ "loss": 42.5959,
400
+ "nll_loss": 1.151334285736084,
401
+ "rewards/accuracies": 0.7124999761581421,
402
+ "rewards/chosen": -0.43127673864364624,
403
+ "rewards/margins": 0.17845533788204193,
404
+ "rewards/rejected": -0.6097320914268494,
405
+ "step": 110
406
+ },
407
+ {
408
+ "epoch": 1.0903703703703704,
409
+ "grad_norm": 67.5,
410
+ "learning_rate": 4.0117925141242174e-05,
411
+ "log_odds_chosen": 0.7351988554000854,
412
+ "log_odds_ratio": -0.494173139333725,
413
+ "logits/chosen": 198.6049346923828,
414
+ "logits/rejected": 193.7996063232422,
415
+ "logps/chosen": -0.8116765022277832,
416
+ "logps/rejected": -1.2775243520736694,
417
+ "loss": 40.7484,
418
+ "nll_loss": 1.101711392402649,
419
+ "rewards/accuracies": 0.762499988079071,
420
+ "rewards/chosen": -0.4058382511138916,
421
+ "rewards/margins": 0.2329239547252655,
422
+ "rewards/rejected": -0.6387621760368347,
423
+ "step": 115
424
+ },
425
+ {
426
+ "epoch": 1.1377777777777778,
427
+ "grad_norm": 49.75,
428
+ "learning_rate": 3.899004663415084e-05,
429
+ "log_odds_chosen": 0.7911133170127869,
430
+ "log_odds_ratio": -0.4592220187187195,
431
+ "logits/chosen": 190.59217834472656,
432
+ "logits/rejected": 190.78323364257812,
433
+ "logps/chosen": -0.7666565775871277,
434
+ "logps/rejected": -1.2360306978225708,
435
+ "loss": 40.7742,
436
+ "nll_loss": 1.0949238538742065,
437
+ "rewards/accuracies": 0.8125,
438
+ "rewards/chosen": -0.38332828879356384,
439
+ "rewards/margins": 0.23468704521656036,
440
+ "rewards/rejected": -0.6180153489112854,
441
+ "step": 120
442
+ },
443
+ {
444
+ "epoch": 1.1851851851851851,
445
+ "grad_norm": 46.25,
446
+ "learning_rate": 3.781907832058587e-05,
447
+ "log_odds_chosen": 0.6842738389968872,
448
+ "log_odds_ratio": -0.5187292695045471,
449
+ "logits/chosen": 190.21575927734375,
450
+ "logits/rejected": 186.94387817382812,
451
+ "logps/chosen": -0.8428317904472351,
452
+ "logps/rejected": -1.2817766666412354,
453
+ "loss": 38.4508,
454
+ "nll_loss": 1.0788408517837524,
455
+ "rewards/accuracies": 0.737500011920929,
456
+ "rewards/chosen": -0.42141589522361755,
457
+ "rewards/margins": 0.21947243809700012,
458
+ "rewards/rejected": -0.6408883333206177,
459
+ "step": 125
460
+ },
461
+ {
462
+ "epoch": 1.2325925925925927,
463
+ "grad_norm": 95.0,
464
+ "learning_rate": 3.660862682169282e-05,
465
+ "log_odds_chosen": 0.5702214241027832,
466
+ "log_odds_ratio": -0.5559878349304199,
467
+ "logits/chosen": 193.6043701171875,
468
+ "logits/rejected": 190.05230712890625,
469
+ "logps/chosen": -0.834156334400177,
470
+ "logps/rejected": -1.1642882823944092,
471
+ "loss": 40.1159,
472
+ "nll_loss": 1.142547607421875,
473
+ "rewards/accuracies": 0.6937500238418579,
474
+ "rewards/chosen": -0.4170781672000885,
475
+ "rewards/margins": 0.16506603360176086,
476
+ "rewards/rejected": -0.5821441411972046,
477
+ "step": 130
478
+ },
479
+ {
480
+ "epoch": 1.28,
481
+ "grad_norm": 97.5,
482
+ "learning_rate": 3.5362420368134356e-05,
483
+ "log_odds_chosen": 0.677783191204071,
484
+ "log_odds_ratio": -0.507438063621521,
485
+ "logits/chosen": 187.4692840576172,
486
+ "logits/rejected": 190.07284545898438,
487
+ "logps/chosen": -0.7681523561477661,
488
+ "logps/rejected": -1.1711633205413818,
489
+ "loss": 40.1397,
490
+ "nll_loss": 1.0826233625411987,
491
+ "rewards/accuracies": 0.699999988079071,
492
+ "rewards/chosen": -0.38407617807388306,
493
+ "rewards/margins": 0.20150542259216309,
494
+ "rewards/rejected": -0.5855816602706909,
495
+ "step": 135
496
+ },
497
+ {
498
+ "epoch": 1.3274074074074074,
499
+ "grad_norm": 63.25,
500
+ "learning_rate": 3.408429731701635e-05,
501
+ "log_odds_chosen": 0.6302875280380249,
502
+ "log_odds_ratio": -0.5435723066329956,
503
+ "logits/chosen": 185.64149475097656,
504
+ "logits/rejected": 188.64393615722656,
505
+ "logps/chosen": -0.8573511242866516,
506
+ "logps/rejected": -1.247619390487671,
507
+ "loss": 41.5044,
508
+ "nll_loss": 1.1718343496322632,
509
+ "rewards/accuracies": 0.7250000238418579,
510
+ "rewards/chosen": -0.4286755621433258,
511
+ "rewards/margins": 0.19513416290283203,
512
+ "rewards/rejected": -0.6238096952438354,
513
+ "step": 140
514
+ },
515
+ {
516
+ "epoch": 1.374814814814815,
517
+ "grad_norm": 52.0,
518
+ "learning_rate": 3.2778194329621104e-05,
519
+ "log_odds_chosen": 0.6812846064567566,
520
+ "log_odds_ratio": -0.49873781204223633,
521
+ "logits/chosen": 186.69728088378906,
522
+ "logits/rejected": 182.98965454101562,
523
+ "logps/chosen": -0.8324145078659058,
524
+ "logps/rejected": -1.271837592124939,
525
+ "loss": 40.1295,
526
+ "nll_loss": 1.1053495407104492,
527
+ "rewards/accuracies": 0.7437499761581421,
528
+ "rewards/chosen": -0.4162072539329529,
529
+ "rewards/margins": 0.21971149742603302,
530
+ "rewards/rejected": -0.6359187960624695,
531
+ "step": 145
532
+ },
533
+ {
534
+ "epoch": 1.4222222222222223,
535
+ "grad_norm": 59.0,
536
+ "learning_rate": 3.144813424636031e-05,
537
+ "log_odds_chosen": 0.6132742762565613,
538
+ "log_odds_ratio": -0.5470070838928223,
539
+ "logits/chosen": 181.83352661132812,
540
+ "logits/rejected": 184.06472778320312,
541
+ "logps/chosen": -0.7881155610084534,
542
+ "logps/rejected": -1.150750994682312,
543
+ "loss": 39.5806,
544
+ "nll_loss": 1.0920333862304688,
545
+ "rewards/accuracies": 0.699999988079071,
546
+ "rewards/chosen": -0.3940577805042267,
547
+ "rewards/margins": 0.18131770193576813,
548
+ "rewards/rejected": -0.575375497341156,
549
+ "step": 150
550
+ },
551
+ {
552
+ "epoch": 1.4696296296296296,
553
+ "grad_norm": 51.5,
554
+ "learning_rate": 3.0098213696293542e-05,
555
+ "log_odds_chosen": 0.5554690361022949,
556
+ "log_odds_ratio": -0.5514319539070129,
557
+ "logits/chosen": 189.21676635742188,
558
+ "logits/rejected": 184.8980255126953,
559
+ "logps/chosen": -0.8024199604988098,
560
+ "logps/rejected": -1.117629051208496,
561
+ "loss": 40.469,
562
+ "nll_loss": 1.1150840520858765,
563
+ "rewards/accuracies": 0.7250000238418579,
564
+ "rewards/chosen": -0.4012099802494049,
565
+ "rewards/margins": 0.15760457515716553,
566
+ "rewards/rejected": -0.558814525604248,
567
+ "step": 155
568
+ },
569
+ {
570
+ "epoch": 1.5170370370370372,
571
+ "grad_norm": 51.25,
572
+ "learning_rate": 2.8732590479375165e-05,
573
+ "log_odds_chosen": 0.6340516805648804,
574
+ "log_odds_ratio": -0.5135641694068909,
575
+ "logits/chosen": 191.99452209472656,
576
+ "logits/rejected": 192.84353637695312,
577
+ "logps/chosen": -0.8782358169555664,
578
+ "logps/rejected": -1.275033950805664,
579
+ "loss": 41.499,
580
+ "nll_loss": 1.1827033758163452,
581
+ "rewards/accuracies": 0.731249988079071,
582
+ "rewards/chosen": -0.4391179084777832,
583
+ "rewards/margins": 0.19839909672737122,
584
+ "rewards/rejected": -0.637516975402832,
585
+ "step": 160
586
+ },
587
+ {
588
+ "epoch": 1.5644444444444443,
589
+ "grad_norm": 45.75,
590
+ "learning_rate": 2.7355470760292956e-05,
591
+ "log_odds_chosen": 0.7264343500137329,
592
+ "log_odds_ratio": -0.49429789185523987,
593
+ "logits/chosen": 190.41761779785156,
594
+ "logits/rejected": 196.09129333496094,
595
+ "logps/chosen": -0.8138422966003418,
596
+ "logps/rejected": -1.2546621561050415,
597
+ "loss": 40.024,
598
+ "nll_loss": 1.0924456119537354,
599
+ "rewards/accuracies": 0.7562500238418579,
600
+ "rewards/chosen": -0.4069211483001709,
601
+ "rewards/margins": 0.22040989995002747,
602
+ "rewards/rejected": -0.6273310780525208,
603
+ "step": 165
604
+ },
605
+ {
606
+ "epoch": 1.6118518518518519,
607
+ "grad_norm": 79.0,
608
+ "learning_rate": 2.597109611334169e-05,
609
+ "log_odds_chosen": 0.7704585790634155,
610
+ "log_odds_ratio": -0.4867461621761322,
611
+ "logits/chosen": 189.745361328125,
612
+ "logits/rejected": 187.7951202392578,
613
+ "logps/chosen": -0.7925730347633362,
614
+ "logps/rejected": -1.269953966140747,
615
+ "loss": 39.7815,
616
+ "nll_loss": 1.0841796398162842,
617
+ "rewards/accuracies": 0.793749988079071,
618
+ "rewards/chosen": -0.3962865173816681,
619
+ "rewards/margins": 0.23869049549102783,
620
+ "rewards/rejected": -0.6349769830703735,
621
+ "step": 170
622
+ },
623
+ {
624
+ "epoch": 1.6592592592592592,
625
+ "grad_norm": 92.5,
626
+ "learning_rate": 2.458373045823404e-05,
627
+ "log_odds_chosen": 0.6176282167434692,
628
+ "log_odds_ratio": -0.521507203578949,
629
+ "logits/chosen": 191.61280822753906,
630
+ "logits/rejected": 187.45404052734375,
631
+ "logps/chosen": -0.7949572801589966,
632
+ "logps/rejected": -1.1731722354888916,
633
+ "loss": 39.3331,
634
+ "nll_loss": 1.114823341369629,
635
+ "rewards/accuracies": 0.737500011920929,
636
+ "rewards/chosen": -0.3974786400794983,
637
+ "rewards/margins": 0.18910741806030273,
638
+ "rewards/rejected": -0.5865861177444458,
639
+ "step": 175
640
+ },
641
+ {
642
+ "epoch": 1.7066666666666666,
643
+ "grad_norm": 65.5,
644
+ "learning_rate": 2.3197646927086697e-05,
645
+ "log_odds_chosen": 0.4565175473690033,
646
+ "log_odds_ratio": -0.5909174680709839,
647
+ "logits/chosen": 188.08010864257812,
648
+ "logits/rejected": 186.85438537597656,
649
+ "logps/chosen": -0.8470600247383118,
650
+ "logps/rejected": -1.1192262172698975,
651
+ "loss": 40.9749,
652
+ "nll_loss": 1.1627644300460815,
653
+ "rewards/accuracies": 0.637499988079071,
654
+ "rewards/chosen": -0.4235300123691559,
655
+ "rewards/margins": 0.13608308136463165,
656
+ "rewards/rejected": -0.5596131086349487,
657
+ "step": 180
658
+ },
659
+ {
660
+ "epoch": 1.7540740740740741,
661
+ "grad_norm": 43.0,
662
+ "learning_rate": 2.1817114703032176e-05,
663
+ "log_odds_chosen": 0.6112794876098633,
664
+ "log_odds_ratio": -0.511337399482727,
665
+ "logits/chosen": 188.8147735595703,
666
+ "logits/rejected": 188.24966430664062,
667
+ "logps/chosen": -0.8174026608467102,
668
+ "logps/rejected": -1.194439172744751,
669
+ "loss": 39.6962,
670
+ "nll_loss": 1.0865495204925537,
671
+ "rewards/accuracies": 0.7562500238418579,
672
+ "rewards/chosen": -0.4087013304233551,
673
+ "rewards/margins": 0.18851831555366516,
674
+ "rewards/rejected": -0.5972195863723755,
675
+ "step": 185
676
+ },
677
+ {
678
+ "epoch": 1.8014814814814815,
679
+ "grad_norm": 49.0,
680
+ "learning_rate": 2.0446385870993467e-05,
681
+ "log_odds_chosen": 0.7244794964790344,
682
+ "log_odds_ratio": -0.4969249665737152,
683
+ "logits/chosen": 192.61624145507812,
684
+ "logits/rejected": 189.07421875,
685
+ "logps/chosen": -0.8428407907485962,
686
+ "logps/rejected": -1.2970322370529175,
687
+ "loss": 41.4332,
688
+ "nll_loss": 1.1237175464630127,
689
+ "rewards/accuracies": 0.7875000238418579,
690
+ "rewards/chosen": -0.4214203953742981,
691
+ "rewards/margins": 0.22709576785564423,
692
+ "rewards/rejected": -0.6485161185264587,
693
+ "step": 190
694
+ },
695
+ {
696
+ "epoch": 1.8488888888888888,
697
+ "grad_norm": 48.25,
698
+ "learning_rate": 1.9089682321121834e-05,
699
+ "log_odds_chosen": 0.7443265914916992,
700
+ "log_odds_ratio": -0.4911392629146576,
701
+ "logits/chosen": 192.0789794921875,
702
+ "logits/rejected": 185.12327575683594,
703
+ "logps/chosen": -0.7979795932769775,
704
+ "logps/rejected": -1.2549121379852295,
705
+ "loss": 39.3639,
706
+ "nll_loss": 1.074517846107483,
707
+ "rewards/accuracies": 0.762499988079071,
708
+ "rewards/chosen": -0.39898979663848877,
709
+ "rewards/margins": 0.2284662276506424,
710
+ "rewards/rejected": -0.6274560689926147,
711
+ "step": 195
712
+ },
713
+ {
714
+ "epoch": 1.8962962962962964,
715
+ "grad_norm": 42.5,
716
+ "learning_rate": 1.775118274523545e-05,
717
+ "log_odds_chosen": 0.5745521187782288,
718
+ "log_odds_ratio": -0.5652587413787842,
719
+ "logits/chosen": 193.84976196289062,
720
+ "logits/rejected": 192.41378784179688,
721
+ "logps/chosen": -0.8293254971504211,
722
+ "logps/rejected": -1.1743533611297607,
723
+ "loss": 41.819,
724
+ "nll_loss": 1.1135209798812866,
725
+ "rewards/accuracies": 0.6875,
726
+ "rewards/chosen": -0.41466274857521057,
727
+ "rewards/margins": 0.17251388728618622,
728
+ "rewards/rejected": -0.5871766805648804,
729
+ "step": 200
730
+ },
731
+ {
732
+ "epoch": 1.9437037037037037,
733
+ "grad_norm": 40.25,
734
+ "learning_rate": 1.643500976631037e-05,
735
+ "log_odds_chosen": 0.6025683879852295,
736
+ "log_odds_ratio": -0.5356005430221558,
737
+ "logits/chosen": 196.73837280273438,
738
+ "logits/rejected": 188.22824096679688,
739
+ "logps/chosen": -0.8113398551940918,
740
+ "logps/rejected": -1.197618007659912,
741
+ "loss": 37.6443,
742
+ "nll_loss": 1.048313856124878,
743
+ "rewards/accuracies": 0.706250011920929,
744
+ "rewards/chosen": -0.4056699275970459,
745
+ "rewards/margins": 0.19313909113407135,
746
+ "rewards/rejected": -0.598809003829956,
747
+ "step": 205
748
+ },
749
+ {
750
+ "epoch": 1.991111111111111,
751
+ "grad_norm": 57.0,
752
+ "learning_rate": 1.514521724066537e-05,
753
+ "log_odds_chosen": 0.5694688558578491,
754
+ "log_odds_ratio": -0.5482410788536072,
755
+ "logits/chosen": 187.6742401123047,
756
+ "logits/rejected": 190.1154022216797,
757
+ "logps/chosen": -0.7763108015060425,
758
+ "logps/rejected": -1.1007602214813232,
759
+ "loss": 37.9486,
760
+ "nll_loss": 1.0500026941299438,
761
+ "rewards/accuracies": 0.6875,
762
+ "rewards/chosen": -0.38815540075302124,
763
+ "rewards/margins": 0.16222473978996277,
764
+ "rewards/rejected": -0.5503801107406616,
765
+ "step": 210
766
+ },
767
+ {
768
+ "epoch": 2.0385185185185186,
769
+ "grad_norm": 40.0,
770
+ "learning_rate": 1.3885777771950348e-05,
771
+ "log_odds_chosen": 0.9869217872619629,
772
+ "log_odds_ratio": -0.41741856932640076,
773
+ "logits/chosen": 183.7540740966797,
774
+ "logits/rejected": 182.7440185546875,
775
+ "logps/chosen": -0.681525707244873,
776
+ "logps/rejected": -1.2115617990493774,
777
+ "loss": 34.0815,
778
+ "nll_loss": 0.9338465929031372,
779
+ "rewards/accuracies": 0.8187500238418579,
780
+ "rewards/chosen": -0.3407628536224365,
781
+ "rewards/margins": 0.26501795649528503,
782
+ "rewards/rejected": -0.6057808995246887,
783
+ "step": 215
784
+ },
785
+ {
786
+ "epoch": 2.0859259259259257,
787
+ "grad_norm": 86.0,
788
+ "learning_rate": 1.2660570475395683e-05,
789
+ "log_odds_chosen": 1.3432199954986572,
790
+ "log_odds_ratio": -0.3477163016796112,
791
+ "logits/chosen": 171.4412078857422,
792
+ "logits/rejected": 176.1414337158203,
793
+ "logps/chosen": -0.5886165499687195,
794
+ "logps/rejected": -1.315865397453308,
795
+ "loss": 31.0776,
796
+ "nll_loss": 0.883051872253418,
797
+ "rewards/accuracies": 0.8687499761581421,
798
+ "rewards/chosen": -0.29430827498435974,
799
+ "rewards/margins": 0.3636243939399719,
800
+ "rewards/rejected": -0.657932698726654,
801
+ "step": 220
802
+ },
803
+ {
804
+ "epoch": 2.1333333333333333,
805
+ "grad_norm": 47.25,
806
+ "learning_rate": 1.1473369030008974e-05,
807
+ "log_odds_chosen": 1.0965574979782104,
808
+ "log_odds_ratio": -0.3985925316810608,
809
+ "logits/chosen": 178.91383361816406,
810
+ "logits/rejected": 177.06851196289062,
811
+ "logps/chosen": -0.6075170040130615,
812
+ "logps/rejected": -1.1777050495147705,
813
+ "loss": 32.3742,
814
+ "nll_loss": 0.8941748738288879,
815
+ "rewards/accuracies": 0.84375,
816
+ "rewards/chosen": -0.30375850200653076,
817
+ "rewards/margins": 0.2850940525531769,
818
+ "rewards/rejected": -0.5888525247573853,
819
+ "step": 225
820
+ },
821
+ {
822
+ "epoch": 2.180740740740741,
823
+ "grad_norm": 51.0,
824
+ "learning_rate": 1.0327830055518842e-05,
825
+ "log_odds_chosen": 1.2348709106445312,
826
+ "log_odds_ratio": -0.33719533681869507,
827
+ "logits/chosen": 176.9068145751953,
828
+ "logits/rejected": 177.44601440429688,
829
+ "logps/chosen": -0.5915892720222473,
830
+ "logps/rejected": -1.2602530717849731,
831
+ "loss": 31.0203,
832
+ "nll_loss": 0.8626457452774048,
833
+ "rewards/accuracies": 0.875,
834
+ "rewards/chosen": -0.29579463601112366,
835
+ "rewards/margins": 0.3343318998813629,
836
+ "rewards/rejected": -0.6301265358924866,
837
+ "step": 230
838
+ },
839
+ {
840
+ "epoch": 2.228148148148148,
841
+ "grad_norm": 44.0,
842
+ "learning_rate": 9.227481849865235e-06,
843
+ "log_odds_chosen": 1.0602965354919434,
844
+ "log_odds_ratio": -0.41205477714538574,
845
+ "logits/chosen": 174.06771850585938,
846
+ "logits/rejected": 178.76084899902344,
847
+ "logps/chosen": -0.6463479995727539,
848
+ "logps/rejected": -1.174586534500122,
849
+ "loss": 32.0119,
850
+ "nll_loss": 0.8945677876472473,
851
+ "rewards/accuracies": 0.8125,
852
+ "rewards/chosen": -0.32317399978637695,
853
+ "rewards/margins": 0.2641192674636841,
854
+ "rewards/rejected": -0.587293267250061,
855
+ "step": 235
856
+ },
857
+ {
858
+ "epoch": 2.2755555555555556,
859
+ "grad_norm": 47.25,
860
+ "learning_rate": 8.175713521924978e-06,
861
+ "log_odds_chosen": 1.286481499671936,
862
+ "log_odds_ratio": -0.3498726785182953,
863
+ "logits/chosen": 171.71487426757812,
864
+ "logits/rejected": 176.8938751220703,
865
+ "logps/chosen": -0.5816351771354675,
866
+ "logps/rejected": -1.2401165962219238,
867
+ "loss": 31.3421,
868
+ "nll_loss": 0.8873510360717773,
869
+ "rewards/accuracies": 0.8687499761581421,
870
+ "rewards/chosen": -0.29081758856773376,
871
+ "rewards/margins": 0.32924067974090576,
872
+ "rewards/rejected": -0.6200582981109619,
873
+ "step": 240
874
+ },
875
+ {
876
+ "epoch": 2.322962962962963,
877
+ "grad_norm": 40.0,
878
+ "learning_rate": 7.1757645529443665e-06,
879
+ "log_odds_chosen": 1.2826303243637085,
880
+ "log_odds_ratio": -0.35086172819137573,
881
+ "logits/chosen": 172.84579467773438,
882
+ "logits/rejected": 174.81533813476562,
883
+ "logps/chosen": -0.6082225441932678,
884
+ "logps/rejected": -1.2833037376403809,
885
+ "loss": 31.0705,
886
+ "nll_loss": 0.8847354054450989,
887
+ "rewards/accuracies": 0.831250011920929,
888
+ "rewards/chosen": -0.3041112720966339,
889
+ "rewards/margins": 0.3375406265258789,
890
+ "rewards/rejected": -0.6416518688201904,
891
+ "step": 245
892
+ },
893
+ {
894
+ "epoch": 2.3703703703703702,
895
+ "grad_norm": 39.5,
896
+ "learning_rate": 6.230714818829733e-06,
897
+ "log_odds_chosen": 1.3057904243469238,
898
+ "log_odds_ratio": -0.3325490355491638,
899
+ "logits/chosen": 168.0796661376953,
900
+ "logits/rejected": 173.2673797607422,
901
+ "logps/chosen": -0.5869969129562378,
902
+ "logps/rejected": -1.2978041172027588,
903
+ "loss": 30.7508,
904
+ "nll_loss": 0.8432788848876953,
905
+ "rewards/accuracies": 0.8500000238418579,
906
+ "rewards/chosen": -0.2934984564781189,
907
+ "rewards/margins": 0.3554036617279053,
908
+ "rewards/rejected": -0.6489020586013794,
909
+ "step": 250
910
+ },
911
+ {
912
+ "epoch": 2.417777777777778,
913
+ "grad_norm": 41.5,
914
+ "learning_rate": 5.343475104027743e-06,
915
+ "log_odds_chosen": 1.4420109987258911,
916
+ "log_odds_ratio": -0.32847458124160767,
917
+ "logits/chosen": 165.7822265625,
918
+ "logits/rejected": 168.45877075195312,
919
+ "logps/chosen": -0.5281413793563843,
920
+ "logps/rejected": -1.296473741531372,
921
+ "loss": 30.1395,
922
+ "nll_loss": 0.8313242793083191,
923
+ "rewards/accuracies": 0.8687499761581421,
924
+ "rewards/chosen": -0.26407068967819214,
925
+ "rewards/margins": 0.3841661512851715,
926
+ "rewards/rejected": -0.648236870765686,
927
+ "step": 255
928
+ },
929
+ {
930
+ "epoch": 2.4651851851851854,
931
+ "grad_norm": 52.0,
932
+ "learning_rate": 4.516778136213037e-06,
933
+ "log_odds_chosen": 1.3616795539855957,
934
+ "log_odds_ratio": -0.3403770327568054,
935
+ "logits/chosen": 166.247802734375,
936
+ "logits/rejected": 171.7600555419922,
937
+ "logps/chosen": -0.5720769166946411,
938
+ "logps/rejected": -1.2837584018707275,
939
+ "loss": 30.4011,
940
+ "nll_loss": 0.8514798283576965,
941
+ "rewards/accuracies": 0.8374999761581421,
942
+ "rewards/chosen": -0.28603845834732056,
943
+ "rewards/margins": 0.3558407723903656,
944
+ "rewards/rejected": -0.6418792009353638,
945
+ "step": 260
946
+ },
947
+ {
948
+ "epoch": 2.5125925925925925,
949
+ "grad_norm": 44.25,
950
+ "learning_rate": 3.7531701693965554e-06,
951
+ "log_odds_chosen": 1.1984084844589233,
952
+ "log_odds_ratio": -0.39315086603164673,
953
+ "logits/chosen": 167.62319946289062,
954
+ "logits/rejected": 171.07821655273438,
955
+ "logps/chosen": -0.6130216717720032,
956
+ "logps/rejected": -1.2671737670898438,
957
+ "loss": 30.4602,
958
+ "nll_loss": 0.8455079197883606,
959
+ "rewards/accuracies": 0.856249988079071,
960
+ "rewards/chosen": -0.3065108358860016,
961
+ "rewards/margins": 0.32707610726356506,
962
+ "rewards/rejected": -0.6335868835449219,
963
+ "step": 265
964
+ },
965
+ {
966
+ "epoch": 2.56,
967
+ "grad_norm": 42.75,
968
+ "learning_rate": 3.055003141378948e-06,
969
+ "log_odds_chosen": 1.36872136592865,
970
+ "log_odds_ratio": -0.3242368698120117,
971
+ "logits/chosen": 169.27310180664062,
972
+ "logits/rejected": 174.55384826660156,
973
+ "logps/chosen": -0.6026479005813599,
974
+ "logps/rejected": -1.332188367843628,
975
+ "loss": 31.0676,
976
+ "nll_loss": 0.8819044828414917,
977
+ "rewards/accuracies": 0.875,
978
+ "rewards/chosen": -0.30132395029067993,
979
+ "rewards/margins": 0.3647702634334564,
980
+ "rewards/rejected": -0.666094183921814,
981
+ "step": 270
982
+ },
983
+ {
984
+ "epoch": 2.6074074074074076,
985
+ "grad_norm": 45.75,
986
+ "learning_rate": 2.424427429704365e-06,
987
+ "log_odds_chosen": 1.3102140426635742,
988
+ "log_odds_ratio": -0.3487251400947571,
989
+ "logits/chosen": 168.80856323242188,
990
+ "logits/rejected": 168.40838623046875,
991
+ "logps/chosen": -0.6217229962348938,
992
+ "logps/rejected": -1.3165475130081177,
993
+ "loss": 31.6148,
994
+ "nll_loss": 0.8825966119766235,
995
+ "rewards/accuracies": 0.8687499761581421,
996
+ "rewards/chosen": -0.3108614981174469,
997
+ "rewards/margins": 0.3474121689796448,
998
+ "rewards/rejected": -0.6582737565040588,
999
+ "step": 275
1000
+ },
1001
+ {
1002
+ "epoch": 2.6548148148148147,
1003
+ "grad_norm": 47.0,
1004
+ "learning_rate": 1.8633852284264508e-06,
1005
+ "log_odds_chosen": 1.2873857021331787,
1006
+ "log_odds_ratio": -0.3503342866897583,
1007
+ "logits/chosen": 168.55545043945312,
1008
+ "logits/rejected": 170.8942413330078,
1009
+ "logps/chosen": -0.5770654678344727,
1010
+ "logps/rejected": -1.2647120952606201,
1011
+ "loss": 29.521,
1012
+ "nll_loss": 0.8119584321975708,
1013
+ "rewards/accuracies": 0.9125000238418579,
1014
+ "rewards/chosen": -0.28853273391723633,
1015
+ "rewards/margins": 0.3438234031200409,
1016
+ "rewards/rejected": -0.6323560476303101,
1017
+ "step": 280
1018
+ },
1019
+ {
1020
+ "epoch": 2.7022222222222223,
1021
+ "grad_norm": 50.25,
1022
+ "learning_rate": 1.3736045660864034e-06,
1023
+ "log_odds_chosen": 1.2470929622650146,
1024
+ "log_odds_ratio": -0.35600870847702026,
1025
+ "logits/chosen": 171.45352172851562,
1026
+ "logits/rejected": 169.7559051513672,
1027
+ "logps/chosen": -0.5817210674285889,
1028
+ "logps/rejected": -1.2366844415664673,
1029
+ "loss": 31.0448,
1030
+ "nll_loss": 0.8478399515151978,
1031
+ "rewards/accuracies": 0.862500011920929,
1032
+ "rewards/chosen": -0.29086053371429443,
1033
+ "rewards/margins": 0.3274817168712616,
1034
+ "rewards/rejected": -0.6183422207832336,
1035
+ "step": 285
1036
+ },
1037
+ {
1038
+ "epoch": 2.74962962962963,
1039
+ "grad_norm": 44.75,
1040
+ "learning_rate": 9.565939833279192e-07,
1041
+ "log_odds_chosen": 1.2974636554718018,
1042
+ "log_odds_ratio": -0.34885287284851074,
1043
+ "logits/chosen": 169.34690856933594,
1044
+ "logits/rejected": 175.30169677734375,
1045
+ "logps/chosen": -0.6116211414337158,
1046
+ "logps/rejected": -1.27151620388031,
1047
+ "loss": 31.3539,
1048
+ "nll_loss": 0.8839661478996277,
1049
+ "rewards/accuracies": 0.875,
1050
+ "rewards/chosen": -0.3058105707168579,
1051
+ "rewards/margins": 0.3299475312232971,
1052
+ "rewards/rejected": -0.635758101940155,
1053
+ "step": 290
1054
+ },
1055
+ {
1056
+ "epoch": 2.797037037037037,
1057
+ "grad_norm": 73.5,
1058
+ "learning_rate": 6.136378865420872e-07,
1059
+ "log_odds_chosen": 1.340787649154663,
1060
+ "log_odds_ratio": -0.37529805302619934,
1061
+ "logits/chosen": 169.1226043701172,
1062
+ "logits/rejected": 172.68368530273438,
1063
+ "logps/chosen": -0.6207782030105591,
1064
+ "logps/rejected": -1.3479269742965698,
1065
+ "loss": 31.3718,
1066
+ "nll_loss": 0.8938194513320923,
1067
+ "rewards/accuracies": 0.893750011920929,
1068
+ "rewards/chosen": -0.31038910150527954,
1069
+ "rewards/margins": 0.36357441544532776,
1070
+ "rewards/rejected": -0.6739634871482849,
1071
+ "step": 295
1072
+ },
1073
+ {
1074
+ "epoch": 2.8444444444444446,
1075
+ "grad_norm": 47.5,
1076
+ "learning_rate": 3.45792591853214e-07,
1077
+ "log_odds_chosen": 1.1846152544021606,
1078
+ "log_odds_ratio": -0.38126683235168457,
1079
+ "logits/chosen": 177.59153747558594,
1080
+ "logits/rejected": 175.08778381347656,
1081
+ "logps/chosen": -0.6378912925720215,
1082
+ "logps/rejected": -1.279050350189209,
1083
+ "loss": 31.709,
1084
+ "nll_loss": 0.9202780723571777,
1085
+ "rewards/accuracies": 0.8374999761581421,
1086
+ "rewards/chosen": -0.31894564628601074,
1087
+ "rewards/margins": 0.32057955861091614,
1088
+ "rewards/rejected": -0.6395251750946045,
1089
+ "step": 300
1090
+ },
1091
+ {
1092
+ "epoch": 2.891851851851852,
1093
+ "grad_norm": 45.0,
1094
+ "learning_rate": 1.538830716302092e-07,
1095
+ "log_odds_chosen": 1.2477754354476929,
1096
+ "log_odds_ratio": -0.35705170035362244,
1097
+ "logits/chosen": 169.60562133789062,
1098
+ "logits/rejected": 170.3755645751953,
1099
+ "logps/chosen": -0.6306732892990112,
1100
+ "logps/rejected": -1.272749900817871,
1101
+ "loss": 29.668,
1102
+ "nll_loss": 0.8301697969436646,
1103
+ "rewards/accuracies": 0.831250011920929,
1104
+ "rewards/chosen": -0.3153366446495056,
1105
+ "rewards/margins": 0.3210383355617523,
1106
+ "rewards/rejected": -0.6363749504089355,
1107
+ "step": 305
1108
+ },
1109
+ {
1110
+ "epoch": 2.9392592592592592,
1111
+ "grad_norm": 57.75,
1112
+ "learning_rate": 3.8500413544415025e-08,
1113
+ "log_odds_chosen": 1.2991540431976318,
1114
+ "log_odds_ratio": -0.34246373176574707,
1115
+ "logits/chosen": 171.08616638183594,
1116
+ "logits/rejected": 171.2054443359375,
1117
+ "logps/chosen": -0.6051537990570068,
1118
+ "logps/rejected": -1.2699061632156372,
1119
+ "loss": 30.8065,
1120
+ "nll_loss": 0.8361706733703613,
1121
+ "rewards/accuracies": 0.875,
1122
+ "rewards/chosen": -0.3025768995285034,
1123
+ "rewards/margins": 0.3323762118816376,
1124
+ "rewards/rejected": -0.6349530816078186,
1125
+ "step": 310
1126
+ },
1127
+ {
1128
+ "epoch": 2.986666666666667,
1129
+ "grad_norm": 55.0,
1130
+ "learning_rate": 0.0,
1131
+ "log_odds_chosen": 1.4246046543121338,
1132
+ "log_odds_ratio": -0.31691890954971313,
1133
+ "logits/chosen": 167.8661346435547,
1134
+ "logits/rejected": 168.34278869628906,
1135
+ "logps/chosen": -0.5480004549026489,
1136
+ "logps/rejected": -1.2894717454910278,
1137
+ "loss": 30.1036,
1138
+ "nll_loss": 0.8535217046737671,
1139
+ "rewards/accuracies": 0.893750011920929,
1140
+ "rewards/chosen": -0.27400022745132446,
1141
+ "rewards/margins": 0.37073561549186707,
1142
+ "rewards/rejected": -0.6447358727455139,
1143
+ "step": 315
1144
+ },
1145
+ {
1146
+ "epoch": 2.986666666666667,
1147
+ "step": 315,
1148
+ "total_flos": 0.0,
1149
+ "train_loss": 60.72040763733879,
1150
+ "train_runtime": 6827.5776,
1151
+ "train_samples_per_second": 2.966,
1152
+ "train_steps_per_second": 0.046
1153
+ }
1154
+ ],
1155
+ "logging_steps": 5,
1156
+ "max_steps": 315,
1157
+ "num_input_tokens_seen": 0,
1158
+ "num_train_epochs": 3,
1159
+ "save_steps": 100000,
1160
+ "stateful_callbacks": {
1161
+ "TrainerControl": {
1162
+ "args": {
1163
+ "should_epoch_stop": false,
1164
+ "should_evaluate": false,
1165
+ "should_log": false,
1166
+ "should_save": true,
1167
+ "should_training_stop": true
1168
+ },
1169
+ "attributes": {}
1170
+ }
1171
+ },
1172
+ "total_flos": 0.0,
1173
+ "train_batch_size": 1,
1174
+ "trial_name": null,
1175
+ "trial_params": null
1176
+ }