just1nseo commited on
Commit
0fe756b
1 Parent(s): 260941d

Model save

Browse files
README.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: alignment-handbook/zephyr-7b-sft-full
3
+ library_name: peft
4
+ license: apache-2.0
5
+ tags:
6
+ - trl
7
+ - dpo
8
+ - generated_from_trainer
9
+ model-index:
10
+ - name: zephyr-dpop-qlora-uf-ours-5e-6-epoch1
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # zephyr-dpop-qlora-uf-ours-5e-6-epoch1
18
+
19
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 1.6048
22
+ - Positive Losses: 8.7062
23
+ - Dpo Losses: 0.6448
24
+ - Rewards/chosen: 0.0014
25
+ - Rewards/rejected: -0.1274
26
+ - Rewards/accuracies: 0.6470
27
+ - Rewards/margins: 0.1288
28
+ - Rewards/margins Max: 0.6375
29
+ - Rewards/margins Min: -0.3461
30
+ - Rewards/margins Std: 0.3295
31
+ - Logps/rejected: -271.3224
32
+ - Logps/chosen: -284.4528
33
+ - Logits/rejected: -2.6733
34
+ - Logits/chosen: -2.7153
35
+
36
+ ## Model description
37
+
38
+ More information needed
39
+
40
+ ## Intended uses & limitations
41
+
42
+ More information needed
43
+
44
+ ## Training and evaluation data
45
+
46
+ More information needed
47
+
48
+ ## Training procedure
49
+
50
+ ### Training hyperparameters
51
+
52
+ The following hyperparameters were used during training:
53
+ - learning_rate: 5e-06
54
+ - train_batch_size: 4
55
+ - eval_batch_size: 8
56
+ - seed: 42
57
+ - distributed_type: multi-GPU
58
+ - num_devices: 2
59
+ - gradient_accumulation_steps: 2
60
+ - total_train_batch_size: 16
61
+ - total_eval_batch_size: 16
62
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
63
+ - lr_scheduler_type: cosine
64
+ - lr_scheduler_warmup_ratio: 0.1
65
+ - num_epochs: 1
66
+
67
+ ### Training results
68
+
69
+ | Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
70
+ |:-------------:|:-----:|:----:|:---------------:|:---------------:|:----------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:|
71
+ | 0.6232 | 0.28 | 100 | 1.1413 | 4.2610 | 0.6656 | 0.0429 | -0.0252 | 0.6230 | 0.0680 | 0.4086 | -0.2281 | 0.2094 | -261.0972 | -280.3080 | -2.6361 | -2.6726 |
72
+ | 0.5625 | 0.56 | 200 | 1.7186 | 9.6677 | 0.6469 | -0.0183 | -0.1426 | 0.6420 | 0.1243 | 0.6362 | -0.3433 | 0.3277 | -272.8399 | -286.4236 | -2.6380 | -2.6780 |
73
+ | 0.4748 | 0.85 | 300 | 1.6048 | 8.7062 | 0.6448 | 0.0014 | -0.1274 | 0.6470 | 0.1288 | 0.6375 | -0.3461 | 0.3295 | -271.3224 | -284.4528 | -2.6733 | -2.7153 |
74
+
75
+
76
+ ### Framework versions
77
+
78
+ - PEFT 0.7.1
79
+ - Transformers 4.39.0.dev0
80
+ - Pytorch 2.1.2+cu121
81
+ - Datasets 2.14.6
82
+ - Tokenizers 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e201d6124872467a926e70ec768d9c5e2217146d8b9f1fd041c52fc6d8a07614
3
  size 671150064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:157adab1be932a160809b22d2cd46ebc2232e78082ff3d33c796d6ccb4400be7
3
  size 671150064
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.5743894765074824,
4
+ "train_runtime": 4311.1014,
5
+ "train_samples": 5678,
6
+ "train_samples_per_second": 1.317,
7
+ "train_steps_per_second": 0.082
8
+ }
runs/Jul29_11-02-20_notebook-deployment-48-7d9b6c99-khd85/events.out.tfevents.1722251035.notebook-deployment-48-7d9b6c99-khd85.3446409.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7672281ddc12cd2176c603733be924e87688b2b3ae9f2d482908ef2c4e6047e4
3
- size 39113
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65c29e5ebef7728389473b36ffab0229575c62799e7ab454c0a958d4e15bf729
3
+ size 44442
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.5743894765074824,
4
+ "train_runtime": 4311.1014,
5
+ "train_samples": 5678,
6
+ "train_samples_per_second": 1.317,
7
+ "train_steps_per_second": 0.082
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,813 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.0,
5
+ "eval_steps": 100,
6
+ "global_step": 355,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "dpo_losses": 0.6931471824645996,
13
+ "epoch": 0.0,
14
+ "grad_norm": 1.6018567815095135,
15
+ "learning_rate": 1.3888888888888888e-07,
16
+ "logits/chosen": -2.861618995666504,
17
+ "logits/rejected": -2.8205904960632324,
18
+ "logps/chosen": -271.06011962890625,
19
+ "logps/rejected": -211.1704559326172,
20
+ "loss": 0.6931,
21
+ "positive_losses": 0.0,
22
+ "rewards/accuracies": 0.0,
23
+ "rewards/chosen": 0.0,
24
+ "rewards/margins": 0.0,
25
+ "rewards/margins_max": 0.0,
26
+ "rewards/margins_min": 0.0,
27
+ "rewards/margins_std": 0.0,
28
+ "rewards/rejected": 0.0,
29
+ "step": 1
30
+ },
31
+ {
32
+ "dpo_losses": 0.6928361654281616,
33
+ "epoch": 0.03,
34
+ "grad_norm": 14.098492351037597,
35
+ "learning_rate": 1.3888888888888892e-06,
36
+ "logits/chosen": -2.8340628147125244,
37
+ "logits/rejected": -2.7916715145111084,
38
+ "logps/chosen": -324.87408447265625,
39
+ "logps/rejected": -274.8518371582031,
40
+ "loss": 0.6969,
41
+ "positive_losses": 0.03656284138560295,
42
+ "rewards/accuracies": 0.5138888955116272,
43
+ "rewards/chosen": 0.001762597355991602,
44
+ "rewards/margins": 0.0006246823468245566,
45
+ "rewards/margins_max": 0.0034460597671568394,
46
+ "rewards/margins_min": -0.002478615380823612,
47
+ "rewards/margins_std": 0.002669532783329487,
48
+ "rewards/rejected": 0.0011379148345440626,
49
+ "step": 10
50
+ },
51
+ {
52
+ "dpo_losses": 0.6901537775993347,
53
+ "epoch": 0.06,
54
+ "grad_norm": 1.829780676576113,
55
+ "learning_rate": 2.7777777777777783e-06,
56
+ "logits/chosen": -2.7248008251190186,
57
+ "logits/rejected": -2.7065372467041016,
58
+ "logps/chosen": -291.9751892089844,
59
+ "logps/rejected": -214.52914428710938,
60
+ "loss": 0.69,
61
+ "positive_losses": 0.00235748291015625,
62
+ "rewards/accuracies": 0.7875000238418579,
63
+ "rewards/chosen": 0.01850745640695095,
64
+ "rewards/margins": 0.006009287666529417,
65
+ "rewards/margins_max": 0.013369890861213207,
66
+ "rewards/margins_min": -0.0006899007130414248,
67
+ "rewards/margins_std": 0.006301888730376959,
68
+ "rewards/rejected": 0.01249817106872797,
69
+ "step": 20
70
+ },
71
+ {
72
+ "dpo_losses": 0.6790497303009033,
73
+ "epoch": 0.08,
74
+ "grad_norm": 2.096661038575657,
75
+ "learning_rate": 4.166666666666667e-06,
76
+ "logits/chosen": -2.8153939247131348,
77
+ "logits/rejected": -2.7460672855377197,
78
+ "logps/chosen": -298.10052490234375,
79
+ "logps/rejected": -229.7678680419922,
80
+ "loss": 0.677,
81
+ "positive_losses": 0.0,
82
+ "rewards/accuracies": 0.887499988079071,
83
+ "rewards/chosen": 0.05605363845825195,
84
+ "rewards/margins": 0.02858993411064148,
85
+ "rewards/margins_max": 0.058357615023851395,
86
+ "rewards/margins_min": 0.004640273749828339,
87
+ "rewards/margins_std": 0.02467900700867176,
88
+ "rewards/rejected": 0.027463700622320175,
89
+ "step": 30
90
+ },
91
+ {
92
+ "dpo_losses": 0.6675597429275513,
93
+ "epoch": 0.11,
94
+ "grad_norm": 1.7320035926217752,
95
+ "learning_rate": 4.998060489154965e-06,
96
+ "logits/chosen": -2.8310070037841797,
97
+ "logits/rejected": -2.751425266265869,
98
+ "logps/chosen": -268.48809814453125,
99
+ "logps/rejected": -222.01107788085938,
100
+ "loss": 0.6662,
101
+ "positive_losses": 0.054492950439453125,
102
+ "rewards/accuracies": 0.862500011920929,
103
+ "rewards/chosen": 0.08996561169624329,
104
+ "rewards/margins": 0.05272960662841797,
105
+ "rewards/margins_max": 0.1101265698671341,
106
+ "rewards/margins_min": 0.003616312053054571,
107
+ "rewards/margins_std": 0.048521898686885834,
108
+ "rewards/rejected": 0.03723599761724472,
109
+ "step": 40
110
+ },
111
+ {
112
+ "dpo_losses": 0.6397972106933594,
113
+ "epoch": 0.14,
114
+ "grad_norm": 9.583890638870626,
115
+ "learning_rate": 4.976275538042932e-06,
116
+ "logits/chosen": -2.7891061305999756,
117
+ "logits/rejected": -2.7175135612487793,
118
+ "logps/chosen": -262.20794677734375,
119
+ "logps/rejected": -231.79653930664062,
120
+ "loss": 0.6446,
121
+ "positive_losses": 0.0,
122
+ "rewards/accuracies": 0.9375,
123
+ "rewards/chosen": 0.13362163305282593,
124
+ "rewards/margins": 0.11281381547451019,
125
+ "rewards/margins_max": 0.23626498878002167,
126
+ "rewards/margins_min": 0.022470083087682724,
127
+ "rewards/margins_std": 0.0988926962018013,
128
+ "rewards/rejected": 0.02080780453979969,
129
+ "step": 50
130
+ },
131
+ {
132
+ "dpo_losses": 0.6110584139823914,
133
+ "epoch": 0.17,
134
+ "grad_norm": 2.0747443213986694,
135
+ "learning_rate": 4.93049306999712e-06,
136
+ "logits/chosen": -2.7118520736694336,
137
+ "logits/rejected": -2.6753315925598145,
138
+ "logps/chosen": -296.9767150878906,
139
+ "logps/rejected": -263.8233947753906,
140
+ "loss": 0.628,
141
+ "positive_losses": 0.011554336175322533,
142
+ "rewards/accuracies": 0.987500011920929,
143
+ "rewards/chosen": 0.16662926971912384,
144
+ "rewards/margins": 0.17714819312095642,
145
+ "rewards/margins_max": 0.30765318870544434,
146
+ "rewards/margins_min": 0.05318903177976608,
147
+ "rewards/margins_std": 0.11578011512756348,
148
+ "rewards/rejected": -0.010518952272832394,
149
+ "step": 60
150
+ },
151
+ {
152
+ "dpo_losses": 0.6022371053695679,
153
+ "epoch": 0.2,
154
+ "grad_norm": 1.5871888283763238,
155
+ "learning_rate": 4.861156761634014e-06,
156
+ "logits/chosen": -2.7271430492401123,
157
+ "logits/rejected": -2.6688759326934814,
158
+ "logps/chosen": -303.47613525390625,
159
+ "logps/rejected": -236.2406463623047,
160
+ "loss": 0.6175,
161
+ "positive_losses": 0.19450588524341583,
162
+ "rewards/accuracies": 0.949999988079071,
163
+ "rewards/chosen": 0.19385087490081787,
164
+ "rewards/margins": 0.19984133541584015,
165
+ "rewards/margins_max": 0.4134605824947357,
166
+ "rewards/margins_min": 0.04761160537600517,
167
+ "rewards/margins_std": 0.16880682110786438,
168
+ "rewards/rejected": -0.00599044980481267,
169
+ "step": 70
170
+ },
171
+ {
172
+ "dpo_losses": 0.5768495798110962,
173
+ "epoch": 0.23,
174
+ "grad_norm": 1.804849988880195,
175
+ "learning_rate": 4.7689385491773934e-06,
176
+ "logits/chosen": -2.738285779953003,
177
+ "logits/rejected": -2.684203863143921,
178
+ "logps/chosen": -300.8853454589844,
179
+ "logps/rejected": -292.05633544921875,
180
+ "loss": 0.6017,
181
+ "positive_losses": 0.328561395406723,
182
+ "rewards/accuracies": 0.987500011920929,
183
+ "rewards/chosen": 0.20385125279426575,
184
+ "rewards/margins": 0.26062771677970886,
185
+ "rewards/margins_max": 0.4970013201236725,
186
+ "rewards/margins_min": 0.05170217156410217,
187
+ "rewards/margins_std": 0.2058703452348709,
188
+ "rewards/rejected": -0.056776486337184906,
189
+ "step": 80
190
+ },
191
+ {
192
+ "dpo_losses": 0.5672236084938049,
193
+ "epoch": 0.25,
194
+ "grad_norm": 2.184742961229221,
195
+ "learning_rate": 4.654732116743193e-06,
196
+ "logits/chosen": -2.6370556354522705,
197
+ "logits/rejected": -2.601066827774048,
198
+ "logps/chosen": -252.70535278320312,
199
+ "logps/rejected": -203.89418029785156,
200
+ "loss": 0.5769,
201
+ "positive_losses": 0.07196970283985138,
202
+ "rewards/accuracies": 0.9750000238418579,
203
+ "rewards/chosen": 0.2328944206237793,
204
+ "rewards/margins": 0.2819642424583435,
205
+ "rewards/margins_max": 0.514846682548523,
206
+ "rewards/margins_min": 0.09985215216875076,
207
+ "rewards/margins_std": 0.19404996931552887,
208
+ "rewards/rejected": -0.049069829285144806,
209
+ "step": 90
210
+ },
211
+ {
212
+ "dpo_losses": 0.5702880620956421,
213
+ "epoch": 0.28,
214
+ "grad_norm": 2.550586173059517,
215
+ "learning_rate": 4.5196442356717526e-06,
216
+ "logits/chosen": -2.6703598499298096,
217
+ "logits/rejected": -2.6374478340148926,
218
+ "logps/chosen": -264.9583740234375,
219
+ "logps/rejected": -273.49615478515625,
220
+ "loss": 0.6232,
221
+ "positive_losses": 1.2302151918411255,
222
+ "rewards/accuracies": 0.949999988079071,
223
+ "rewards/chosen": 0.16453364491462708,
224
+ "rewards/margins": 0.27762115001678467,
225
+ "rewards/margins_max": 0.5491287708282471,
226
+ "rewards/margins_min": 0.05581303685903549,
227
+ "rewards/margins_std": 0.22483690083026886,
228
+ "rewards/rejected": -0.113087497651577,
229
+ "step": 100
230
+ },
231
+ {
232
+ "epoch": 0.28,
233
+ "eval_dpo_losses": 0.6656126976013184,
234
+ "eval_logits/chosen": -2.67258620262146,
235
+ "eval_logits/rejected": -2.6360833644866943,
236
+ "eval_logps/chosen": -280.30804443359375,
237
+ "eval_logps/rejected": -261.0971984863281,
238
+ "eval_loss": 1.1412982940673828,
239
+ "eval_positive_losses": 4.261031627655029,
240
+ "eval_rewards/accuracies": 0.6230000257492065,
241
+ "eval_rewards/chosen": 0.04285382851958275,
242
+ "eval_rewards/margins": 0.06803657114505768,
243
+ "eval_rewards/margins_max": 0.40864306688308716,
244
+ "eval_rewards/margins_min": -0.22808942198753357,
245
+ "eval_rewards/margins_std": 0.2094314992427826,
246
+ "eval_rewards/rejected": -0.02518274076282978,
247
+ "eval_runtime": 429.2755,
248
+ "eval_samples_per_second": 4.659,
249
+ "eval_steps_per_second": 0.291,
250
+ "step": 100
251
+ },
252
+ {
253
+ "dpo_losses": 0.5097740888595581,
254
+ "epoch": 0.31,
255
+ "grad_norm": 6.336382416368574,
256
+ "learning_rate": 4.364984038837727e-06,
257
+ "logits/chosen": -2.742903709411621,
258
+ "logits/rejected": -2.654869318008423,
259
+ "logps/chosen": -349.24517822265625,
260
+ "logps/rejected": -304.54730224609375,
261
+ "loss": 0.543,
262
+ "positive_losses": 0.44344156980514526,
263
+ "rewards/accuracies": 1.0,
264
+ "rewards/chosen": 0.25360527634620667,
265
+ "rewards/margins": 0.43474069237709045,
266
+ "rewards/margins_max": 0.7704421281814575,
267
+ "rewards/margins_min": 0.1366521120071411,
268
+ "rewards/margins_std": 0.2834155559539795,
269
+ "rewards/rejected": -0.1811354160308838,
270
+ "step": 110
271
+ },
272
+ {
273
+ "dpo_losses": 0.518837571144104,
274
+ "epoch": 0.34,
275
+ "grad_norm": 2.194144050007341,
276
+ "learning_rate": 4.192250333880045e-06,
277
+ "logits/chosen": -2.7281386852264404,
278
+ "logits/rejected": -2.670868396759033,
279
+ "logps/chosen": -321.75982666015625,
280
+ "logps/rejected": -280.87091064453125,
281
+ "loss": 0.5524,
282
+ "positive_losses": 0.46012669801712036,
283
+ "rewards/accuracies": 0.9750000238418579,
284
+ "rewards/chosen": 0.26626402139663696,
285
+ "rewards/margins": 0.4130307137966156,
286
+ "rewards/margins_max": 0.7945607900619507,
287
+ "rewards/margins_min": 0.14706461131572723,
288
+ "rewards/margins_std": 0.2963123917579651,
289
+ "rewards/rejected": -0.14676669239997864,
290
+ "step": 120
291
+ },
292
+ {
293
+ "dpo_losses": 0.4917011260986328,
294
+ "epoch": 0.37,
295
+ "grad_norm": 1.7534787479023215,
296
+ "learning_rate": 4.0031170782990214e-06,
297
+ "logits/chosen": -2.711912155151367,
298
+ "logits/rejected": -2.634033441543579,
299
+ "logps/chosen": -353.554443359375,
300
+ "logps/rejected": -320.6388244628906,
301
+ "loss": 0.5518,
302
+ "positive_losses": 0.8977662920951843,
303
+ "rewards/accuracies": 0.987500011920929,
304
+ "rewards/chosen": 0.2880980372428894,
305
+ "rewards/margins": 0.4901772439479828,
306
+ "rewards/margins_max": 0.8924927711486816,
307
+ "rewards/margins_min": 0.1499636471271515,
308
+ "rewards/margins_std": 0.3346417546272278,
309
+ "rewards/rejected": -0.20207922160625458,
310
+ "step": 130
311
+ },
312
+ {
313
+ "dpo_losses": 0.4866393208503723,
314
+ "epoch": 0.39,
315
+ "grad_norm": 21.27134583914694,
316
+ "learning_rate": 3.7994171571810756e-06,
317
+ "logits/chosen": -2.6895060539245605,
318
+ "logits/rejected": -2.6512811183929443,
319
+ "logps/chosen": -291.05548095703125,
320
+ "logps/rejected": -294.4687805175781,
321
+ "loss": 0.5718,
322
+ "positive_losses": 0.2207096517086029,
323
+ "rewards/accuracies": 0.987500011920929,
324
+ "rewards/chosen": 0.2735855281352997,
325
+ "rewards/margins": 0.5197780132293701,
326
+ "rewards/margins_max": 1.003483772277832,
327
+ "rewards/margins_min": 0.1269286870956421,
328
+ "rewards/margins_std": 0.3979441523551941,
329
+ "rewards/rejected": -0.24619252979755402,
330
+ "step": 140
331
+ },
332
+ {
333
+ "dpo_losses": 0.5046078562736511,
334
+ "epoch": 0.42,
335
+ "grad_norm": 3.3011186957688583,
336
+ "learning_rate": 3.5831246207606597e-06,
337
+ "logits/chosen": -2.6959190368652344,
338
+ "logits/rejected": -2.658679962158203,
339
+ "logps/chosen": -264.2646179199219,
340
+ "logps/rejected": -234.5491180419922,
341
+ "loss": 0.5366,
342
+ "positive_losses": 0.490040123462677,
343
+ "rewards/accuracies": 0.949999988079071,
344
+ "rewards/chosen": 0.24420371651649475,
345
+ "rewards/margins": 0.45886701345443726,
346
+ "rewards/margins_max": 0.8680801391601562,
347
+ "rewards/margins_min": 0.1154303103685379,
348
+ "rewards/margins_std": 0.34930768609046936,
349
+ "rewards/rejected": -0.2146632969379425,
350
+ "step": 150
351
+ },
352
+ {
353
+ "dpo_losses": 0.48088502883911133,
354
+ "epoch": 0.45,
355
+ "grad_norm": 2.135658014816511,
356
+ "learning_rate": 3.3563355539546795e-06,
357
+ "logits/chosen": -2.665548801422119,
358
+ "logits/rejected": -2.6138901710510254,
359
+ "logps/chosen": -274.263427734375,
360
+ "logps/rejected": -260.50518798828125,
361
+ "loss": 0.5724,
362
+ "positive_losses": 0.9731669425964355,
363
+ "rewards/accuracies": 0.987500011920929,
364
+ "rewards/chosen": 0.262834370136261,
365
+ "rewards/margins": 0.5239533185958862,
366
+ "rewards/margins_max": 0.9884392023086548,
367
+ "rewards/margins_min": 0.15575796365737915,
368
+ "rewards/margins_std": 0.3754872977733612,
369
+ "rewards/rejected": -0.26111894845962524,
370
+ "step": 160
371
+ },
372
+ {
373
+ "dpo_losses": 0.4504636824131012,
374
+ "epoch": 0.48,
375
+ "grad_norm": 3.940043763048366,
376
+ "learning_rate": 3.121247763262235e-06,
377
+ "logits/chosen": -2.708754777908325,
378
+ "logits/rejected": -2.657917022705078,
379
+ "logps/chosen": -297.7489013671875,
380
+ "logps/rejected": -327.0563049316406,
381
+ "loss": 0.4813,
382
+ "positive_losses": 0.03098602220416069,
383
+ "rewards/accuracies": 0.9624999761581421,
384
+ "rewards/chosen": 0.3187271058559418,
385
+ "rewards/margins": 0.6266334652900696,
386
+ "rewards/margins_max": 1.0517089366912842,
387
+ "rewards/margins_min": 0.17500966787338257,
388
+ "rewards/margins_std": 0.3909396231174469,
389
+ "rewards/rejected": -0.3079063296318054,
390
+ "step": 170
391
+ },
392
+ {
393
+ "dpo_losses": 0.4588772654533386,
394
+ "epoch": 0.51,
395
+ "grad_norm": 8.823245159881209,
396
+ "learning_rate": 2.8801394778833475e-06,
397
+ "logits/chosen": -2.6968963146209717,
398
+ "logits/rejected": -2.6140356063842773,
399
+ "logps/chosen": -305.4325866699219,
400
+ "logps/rejected": -326.99798583984375,
401
+ "loss": 0.5468,
402
+ "positive_losses": 0.8232825994491577,
403
+ "rewards/accuracies": 0.987500011920929,
404
+ "rewards/chosen": 0.2646820843219757,
405
+ "rewards/margins": 0.5928131937980652,
406
+ "rewards/margins_max": 1.0361554622650146,
407
+ "rewards/margins_min": 0.28750157356262207,
408
+ "rewards/margins_std": 0.33570224046707153,
409
+ "rewards/rejected": -0.32813113927841187,
410
+ "step": 180
411
+ },
412
+ {
413
+ "dpo_losses": 0.45539379119873047,
414
+ "epoch": 0.54,
415
+ "grad_norm": 3.517893013000186,
416
+ "learning_rate": 2.6353472714635443e-06,
417
+ "logits/chosen": -2.6537580490112305,
418
+ "logits/rejected": -2.5634191036224365,
419
+ "logps/chosen": -287.6109619140625,
420
+ "logps/rejected": -265.6959228515625,
421
+ "loss": 0.5435,
422
+ "positive_losses": 0.9886103868484497,
423
+ "rewards/accuracies": 0.949999988079071,
424
+ "rewards/chosen": 0.34349915385246277,
425
+ "rewards/margins": 0.6255816221237183,
426
+ "rewards/margins_max": 1.1905597448349,
427
+ "rewards/margins_min": 0.168921560049057,
428
+ "rewards/margins_std": 0.453277051448822,
429
+ "rewards/rejected": -0.2820824980735779,
430
+ "step": 190
431
+ },
432
+ {
433
+ "dpo_losses": 0.44315657019615173,
434
+ "epoch": 0.56,
435
+ "grad_norm": 27.976402148032502,
436
+ "learning_rate": 2.3892434184240536e-06,
437
+ "logits/chosen": -2.7400636672973633,
438
+ "logits/rejected": -2.662397623062134,
439
+ "logps/chosen": -309.39691162109375,
440
+ "logps/rejected": -299.7530212402344,
441
+ "loss": 0.5625,
442
+ "positive_losses": 0.9616166353225708,
443
+ "rewards/accuracies": 0.9624999761581421,
444
+ "rewards/chosen": 0.30135902762413025,
445
+ "rewards/margins": 0.6429153084754944,
446
+ "rewards/margins_max": 1.131412148475647,
447
+ "rewards/margins_min": 0.17879006266593933,
448
+ "rewards/margins_std": 0.4260264039039612,
449
+ "rewards/rejected": -0.34155628085136414,
450
+ "step": 200
451
+ },
452
+ {
453
+ "epoch": 0.56,
454
+ "eval_dpo_losses": 0.6469283699989319,
455
+ "eval_logits/chosen": -2.678022623062134,
456
+ "eval_logits/rejected": -2.6380200386047363,
457
+ "eval_logps/chosen": -286.4236145019531,
458
+ "eval_logps/rejected": -272.83990478515625,
459
+ "eval_loss": 1.7185667753219604,
460
+ "eval_positive_losses": 9.667731285095215,
461
+ "eval_rewards/accuracies": 0.6420000195503235,
462
+ "eval_rewards/chosen": -0.018302178010344505,
463
+ "eval_rewards/margins": 0.1243075579404831,
464
+ "eval_rewards/margins_max": 0.6361650228500366,
465
+ "eval_rewards/margins_min": -0.3433184325695038,
466
+ "eval_rewards/margins_std": 0.32774004340171814,
467
+ "eval_rewards/rejected": -0.14260973036289215,
468
+ "eval_runtime": 428.2243,
469
+ "eval_samples_per_second": 4.67,
470
+ "eval_steps_per_second": 0.292,
471
+ "step": 200
472
+ },
473
+ {
474
+ "dpo_losses": 0.4354400634765625,
475
+ "epoch": 0.59,
476
+ "grad_norm": 23.522369776083625,
477
+ "learning_rate": 2.1442129043167877e-06,
478
+ "logits/chosen": -2.6434009075164795,
479
+ "logits/rejected": -2.6138339042663574,
480
+ "logps/chosen": -286.7272033691406,
481
+ "logps/rejected": -291.8896789550781,
482
+ "loss": 0.513,
483
+ "positive_losses": 0.665066123008728,
484
+ "rewards/accuracies": 0.9624999761581421,
485
+ "rewards/chosen": 0.33457106351852417,
486
+ "rewards/margins": 0.6892200708389282,
487
+ "rewards/margins_max": 1.2269551753997803,
488
+ "rewards/margins_min": 0.18099449574947357,
489
+ "rewards/margins_std": 0.46556130051612854,
490
+ "rewards/rejected": -0.35464900732040405,
491
+ "step": 210
492
+ },
493
+ {
494
+ "dpo_losses": 0.4387238025665283,
495
+ "epoch": 0.62,
496
+ "grad_norm": 11.92404423048434,
497
+ "learning_rate": 1.9026303129961049e-06,
498
+ "logits/chosen": -2.7612462043762207,
499
+ "logits/rejected": -2.664234161376953,
500
+ "logps/chosen": -319.7461853027344,
501
+ "logps/rejected": -306.0053405761719,
502
+ "loss": 0.5894,
503
+ "positive_losses": 1.1452913284301758,
504
+ "rewards/accuracies": 0.9750000238418579,
505
+ "rewards/chosen": 0.33710065484046936,
506
+ "rewards/margins": 0.6538791656494141,
507
+ "rewards/margins_max": 1.1509373188018799,
508
+ "rewards/margins_min": 0.19225715100765228,
509
+ "rewards/margins_std": 0.4403897225856781,
510
+ "rewards/rejected": -0.3167785704135895,
511
+ "step": 220
512
+ },
513
+ {
514
+ "dpo_losses": 0.44511428475379944,
515
+ "epoch": 0.65,
516
+ "grad_norm": 2.419282473127918,
517
+ "learning_rate": 1.66683681459314e-06,
518
+ "logits/chosen": -2.773876428604126,
519
+ "logits/rejected": -2.67607045173645,
520
+ "logps/chosen": -339.04718017578125,
521
+ "logps/rejected": -293.1225891113281,
522
+ "loss": 0.4763,
523
+ "positive_losses": 0.6133368611335754,
524
+ "rewards/accuracies": 0.987500011920929,
525
+ "rewards/chosen": 0.32628515362739563,
526
+ "rewards/margins": 0.6365767121315002,
527
+ "rewards/margins_max": 1.125410795211792,
528
+ "rewards/margins_min": 0.21782192587852478,
529
+ "rewards/margins_std": 0.4051855504512787,
530
+ "rewards/rejected": -0.3102915287017822,
531
+ "step": 230
532
+ },
533
+ {
534
+ "dpo_losses": 0.4544529318809509,
535
+ "epoch": 0.68,
536
+ "grad_norm": 13.447116267552904,
537
+ "learning_rate": 1.4391174773015836e-06,
538
+ "logits/chosen": -2.7197587490081787,
539
+ "logits/rejected": -2.649749279022217,
540
+ "logps/chosen": -302.6105041503906,
541
+ "logps/rejected": -321.8402404785156,
542
+ "loss": 0.692,
543
+ "positive_losses": 2.48455810546875,
544
+ "rewards/accuracies": 0.9375,
545
+ "rewards/chosen": 0.22186538577079773,
546
+ "rewards/margins": 0.6085190773010254,
547
+ "rewards/margins_max": 1.1415433883666992,
548
+ "rewards/margins_min": 0.23370866477489471,
549
+ "rewards/margins_std": 0.41311854124069214,
550
+ "rewards/rejected": -0.38665369153022766,
551
+ "step": 240
552
+ },
553
+ {
554
+ "dpo_losses": 0.45861634612083435,
555
+ "epoch": 0.7,
556
+ "grad_norm": 5.111403689556549,
557
+ "learning_rate": 1.2216791228457778e-06,
558
+ "logits/chosen": -2.716823101043701,
559
+ "logits/rejected": -2.640800952911377,
560
+ "logps/chosen": -280.11114501953125,
561
+ "logps/rejected": -281.67138671875,
562
+ "loss": 0.4992,
563
+ "positive_losses": 0.6084854006767273,
564
+ "rewards/accuracies": 0.987500011920929,
565
+ "rewards/chosen": 0.31169968843460083,
566
+ "rewards/margins": 0.6179708242416382,
567
+ "rewards/margins_max": 1.2185614109039307,
568
+ "rewards/margins_min": 0.1615341305732727,
569
+ "rewards/margins_std": 0.4740964472293854,
570
+ "rewards/rejected": -0.30627113580703735,
571
+ "step": 250
572
+ },
573
+ {
574
+ "dpo_losses": 0.4628082811832428,
575
+ "epoch": 0.73,
576
+ "grad_norm": 2.699692592075128,
577
+ "learning_rate": 1.0166289402331391e-06,
578
+ "logits/chosen": -2.7728962898254395,
579
+ "logits/rejected": -2.684753894805908,
580
+ "logps/chosen": -263.36126708984375,
581
+ "logps/rejected": -289.21661376953125,
582
+ "loss": 0.5624,
583
+ "positive_losses": 0.9304378628730774,
584
+ "rewards/accuracies": 0.9624999761581421,
585
+ "rewards/chosen": 0.28732261061668396,
586
+ "rewards/margins": 0.5901791453361511,
587
+ "rewards/margins_max": 1.102694034576416,
588
+ "rewards/margins_min": 0.17682021856307983,
589
+ "rewards/margins_std": 0.4229150712490082,
590
+ "rewards/rejected": -0.30285659432411194,
591
+ "step": 260
592
+ },
593
+ {
594
+ "dpo_losses": 0.4588424265384674,
595
+ "epoch": 0.76,
596
+ "grad_norm": 2.4735784513371377,
597
+ "learning_rate": 8.259540650444736e-07,
598
+ "logits/chosen": -2.717153787612915,
599
+ "logits/rejected": -2.662932872772217,
600
+ "logps/chosen": -278.75482177734375,
601
+ "logps/rejected": -291.56866455078125,
602
+ "loss": 0.5853,
603
+ "positive_losses": 0.9098857641220093,
604
+ "rewards/accuracies": 0.949999988079071,
605
+ "rewards/chosen": 0.30360764265060425,
606
+ "rewards/margins": 0.5942984223365784,
607
+ "rewards/margins_max": 1.0322821140289307,
608
+ "rewards/margins_min": 0.21275146305561066,
609
+ "rewards/margins_std": 0.36198341846466064,
610
+ "rewards/rejected": -0.2906908392906189,
611
+ "step": 270
612
+ },
613
+ {
614
+ "dpo_losses": 0.4629085958003998,
615
+ "epoch": 0.79,
616
+ "grad_norm": 13.451546074592132,
617
+ "learning_rate": 6.515023221586722e-07,
618
+ "logits/chosen": -2.6962451934814453,
619
+ "logits/rejected": -2.6575076580047607,
620
+ "logps/chosen": -274.9664001464844,
621
+ "logps/rejected": -304.9722595214844,
622
+ "loss": 0.5625,
623
+ "positive_losses": 1.4465850591659546,
624
+ "rewards/accuracies": 0.9750000238418579,
625
+ "rewards/chosen": 0.2849060893058777,
626
+ "rewards/margins": 0.60865718126297,
627
+ "rewards/margins_max": 1.1329301595687866,
628
+ "rewards/margins_min": 0.1755952537059784,
629
+ "rewards/margins_std": 0.4414794445037842,
630
+ "rewards/rejected": -0.3237510919570923,
631
+ "step": 280
632
+ },
633
+ {
634
+ "dpo_losses": 0.47258663177490234,
635
+ "epoch": 0.82,
636
+ "grad_norm": 2.654477953260434,
637
+ "learning_rate": 4.949643185335288e-07,
638
+ "logits/chosen": -2.707307815551758,
639
+ "logits/rejected": -2.652792453765869,
640
+ "logps/chosen": -259.1030578613281,
641
+ "logps/rejected": -292.6324462890625,
642
+ "loss": 0.6149,
643
+ "positive_losses": 1.7202523946762085,
644
+ "rewards/accuracies": 0.949999988079071,
645
+ "rewards/chosen": 0.27813172340393066,
646
+ "rewards/margins": 0.5642735958099365,
647
+ "rewards/margins_max": 1.0385398864746094,
648
+ "rewards/margins_min": 0.12702254951000214,
649
+ "rewards/margins_std": 0.4158683717250824,
650
+ "rewards/rejected": -0.28614187240600586,
651
+ "step": 290
652
+ },
653
+ {
654
+ "dpo_losses": 0.4324049949645996,
655
+ "epoch": 0.85,
656
+ "grad_norm": 11.591501845708454,
657
+ "learning_rate": 3.578570595810274e-07,
658
+ "logits/chosen": -2.7821717262268066,
659
+ "logits/rejected": -2.6995315551757812,
660
+ "logps/chosen": -309.7518310546875,
661
+ "logps/rejected": -320.70916748046875,
662
+ "loss": 0.4748,
663
+ "positive_losses": 0.8444260358810425,
664
+ "rewards/accuracies": 1.0,
665
+ "rewards/chosen": 0.3676120638847351,
666
+ "rewards/margins": 0.6803697943687439,
667
+ "rewards/margins_max": 1.199285864830017,
668
+ "rewards/margins_min": 0.21351738274097443,
669
+ "rewards/margins_std": 0.4415613114833832,
670
+ "rewards/rejected": -0.312757670879364,
671
+ "step": 300
672
+ },
673
+ {
674
+ "epoch": 0.85,
675
+ "eval_dpo_losses": 0.6448404788970947,
676
+ "eval_logits/chosen": -2.715327739715576,
677
+ "eval_logits/rejected": -2.6732916831970215,
678
+ "eval_logps/chosen": -284.4527587890625,
679
+ "eval_logps/rejected": -271.32244873046875,
680
+ "eval_loss": 1.6048117876052856,
681
+ "eval_positive_losses": 8.706162452697754,
682
+ "eval_rewards/accuracies": 0.6470000147819519,
683
+ "eval_rewards/chosen": 0.0014067561132833362,
684
+ "eval_rewards/margins": 0.12884218990802765,
685
+ "eval_rewards/margins_max": 0.6374967098236084,
686
+ "eval_rewards/margins_min": -0.34605804085731506,
687
+ "eval_rewards/margins_std": 0.3295030891895294,
688
+ "eval_rewards/rejected": -0.12743544578552246,
689
+ "eval_runtime": 428.2498,
690
+ "eval_samples_per_second": 4.67,
691
+ "eval_steps_per_second": 0.292,
692
+ "step": 300
693
+ },
694
+ {
695
+ "dpo_losses": 0.45941466093063354,
696
+ "epoch": 0.87,
697
+ "grad_norm": 2.6085680781835205,
698
+ "learning_rate": 2.4150924791035037e-07,
699
+ "logits/chosen": -2.774445056915283,
700
+ "logits/rejected": -2.673360824584961,
701
+ "logps/chosen": -267.74237060546875,
702
+ "logps/rejected": -243.88473510742188,
703
+ "loss": 0.5697,
704
+ "positive_losses": 1.3653801679611206,
705
+ "rewards/accuracies": 0.987500011920929,
706
+ "rewards/chosen": 0.30073457956314087,
707
+ "rewards/margins": 0.5973426699638367,
708
+ "rewards/margins_max": 1.1060882806777954,
709
+ "rewards/margins_min": 0.18351522088050842,
710
+ "rewards/margins_std": 0.4086340069770813,
711
+ "rewards/rejected": -0.2966081500053406,
712
+ "step": 310
713
+ },
714
+ {
715
+ "dpo_losses": 0.45310109853744507,
716
+ "epoch": 0.9,
717
+ "grad_norm": 10.060071948421735,
718
+ "learning_rate": 1.4704840690808658e-07,
719
+ "logits/chosen": -2.738978385925293,
720
+ "logits/rejected": -2.680860757827759,
721
+ "logps/chosen": -279.5138854980469,
722
+ "logps/rejected": -293.9893493652344,
723
+ "loss": 0.5692,
724
+ "positive_losses": 1.6892318725585938,
725
+ "rewards/accuracies": 0.9624999761581421,
726
+ "rewards/chosen": 0.2875928282737732,
727
+ "rewards/margins": 0.6207860708236694,
728
+ "rewards/margins_max": 1.124011754989624,
729
+ "rewards/margins_min": 0.14557920396327972,
730
+ "rewards/margins_std": 0.44626301527023315,
731
+ "rewards/rejected": -0.33319321274757385,
732
+ "step": 320
733
+ },
734
+ {
735
+ "dpo_losses": 0.42673492431640625,
736
+ "epoch": 0.93,
737
+ "grad_norm": 9.476085880429812,
738
+ "learning_rate": 7.538995394063996e-08,
739
+ "logits/chosen": -2.8187005519866943,
740
+ "logits/rejected": -2.7311813831329346,
741
+ "logps/chosen": -318.88360595703125,
742
+ "logps/rejected": -302.66058349609375,
743
+ "loss": 0.5314,
744
+ "positive_losses": 0.5069873929023743,
745
+ "rewards/accuracies": 0.987500011920929,
746
+ "rewards/chosen": 0.35436224937438965,
747
+ "rewards/margins": 0.7115713953971863,
748
+ "rewards/margins_max": 1.23550546169281,
749
+ "rewards/margins_min": 0.2139424830675125,
750
+ "rewards/margins_std": 0.4558965563774109,
751
+ "rewards/rejected": -0.35720914602279663,
752
+ "step": 330
753
+ },
754
+ {
755
+ "dpo_losses": 0.4437997341156006,
756
+ "epoch": 0.96,
757
+ "grad_norm": 2.682118994824555,
758
+ "learning_rate": 2.722832907015971e-08,
759
+ "logits/chosen": -2.6981847286224365,
760
+ "logits/rejected": -2.6440398693084717,
761
+ "logps/chosen": -266.6497802734375,
762
+ "logps/rejected": -282.98199462890625,
763
+ "loss": 0.5024,
764
+ "positive_losses": 0.9627658724784851,
765
+ "rewards/accuracies": 0.9750000238418579,
766
+ "rewards/chosen": 0.3319571018218994,
767
+ "rewards/margins": 0.6500804424285889,
768
+ "rewards/margins_max": 1.2494922876358032,
769
+ "rewards/margins_min": 0.25120097398757935,
770
+ "rewards/margins_std": 0.4507668614387512,
771
+ "rewards/rejected": -0.31812337040901184,
772
+ "step": 340
773
+ },
774
+ {
775
+ "dpo_losses": 0.4518283009529114,
776
+ "epoch": 0.99,
777
+ "grad_norm": 5.762126574549782,
778
+ "learning_rate": 3.030265255329623e-09,
779
+ "logits/chosen": -2.6820361614227295,
780
+ "logits/rejected": -2.6376953125,
781
+ "logps/chosen": -285.1527404785156,
782
+ "logps/rejected": -317.6675720214844,
783
+ "loss": 0.5059,
784
+ "positive_losses": 0.9290813207626343,
785
+ "rewards/accuracies": 0.987500011920929,
786
+ "rewards/chosen": 0.2980636656284332,
787
+ "rewards/margins": 0.6102195978164673,
788
+ "rewards/margins_max": 1.0686355829238892,
789
+ "rewards/margins_min": 0.20541608333587646,
790
+ "rewards/margins_std": 0.38572338223457336,
791
+ "rewards/rejected": -0.31215590238571167,
792
+ "step": 350
793
+ },
794
+ {
795
+ "epoch": 1.0,
796
+ "step": 355,
797
+ "total_flos": 0.0,
798
+ "train_loss": 0.5743894765074824,
799
+ "train_runtime": 4311.1014,
800
+ "train_samples_per_second": 1.317,
801
+ "train_steps_per_second": 0.082
802
+ }
803
+ ],
804
+ "logging_steps": 10,
805
+ "max_steps": 355,
806
+ "num_input_tokens_seen": 0,
807
+ "num_train_epochs": 1,
808
+ "save_steps": 100,
809
+ "total_flos": 0.0,
810
+ "train_batch_size": 4,
811
+ "trial_name": null,
812
+ "trial_params": null
813
+ }