just1nseo commited on
Commit
19f2bf7
1 Parent(s): a1e6ecd

Model save

Browse files
README.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: alignment-handbook/zephyr-7b-sft-full
3
+ library_name: peft
4
+ license: apache-2.0
5
+ tags:
6
+ - trl
7
+ - dpo
8
+ - generated_from_trainer
9
+ model-index:
10
+ - name: zephyr-dpop-qlora-uf-ours-5e-7-epoch1
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # zephyr-dpop-qlora-uf-ours-5e-7-epoch1
18
+
19
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.7024
22
+ - Positive Losses: 0.1592
23
+ - Dpo Losses: 0.6845
24
+ - Rewards/chosen: 0.0528
25
+ - Rewards/rejected: 0.0346
26
+ - Rewards/accuracies: 0.5940
27
+ - Rewards/margins: 0.0182
28
+ - Rewards/margins Max: 0.1072
29
+ - Rewards/margins Min: -0.0591
30
+ - Rewards/margins Std: 0.0555
31
+ - Logps/rejected: -255.1217
32
+ - Logps/chosen: -279.3168
33
+ - Logits/rejected: -2.7478
34
+ - Logits/chosen: -2.7859
35
+
36
+ ## Model description
37
+
38
+ More information needed
39
+
40
+ ## Intended uses & limitations
41
+
42
+ More information needed
43
+
44
+ ## Training and evaluation data
45
+
46
+ More information needed
47
+
48
+ ## Training procedure
49
+
50
+ ### Training hyperparameters
51
+
52
+ The following hyperparameters were used during training:
53
+ - learning_rate: 5e-07
54
+ - train_batch_size: 4
55
+ - eval_batch_size: 8
56
+ - seed: 42
57
+ - distributed_type: multi-GPU
58
+ - num_devices: 2
59
+ - gradient_accumulation_steps: 2
60
+ - total_train_batch_size: 16
61
+ - total_eval_batch_size: 16
62
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
63
+ - lr_scheduler_type: cosine
64
+ - lr_scheduler_warmup_ratio: 0.1
65
+ - num_epochs: 1
66
+
67
+ ### Training results
68
+
69
+ | Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
70
+ |:-------------:|:-----:|:----:|:---------------:|:---------------:|:----------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:|
71
+ | 0.6834 | 0.28 | 100 | 0.6932 | 0.0200 | 0.6905 | 0.0248 | 0.0194 | 0.5930 | 0.0054 | 0.0341 | -0.0197 | 0.0178 | -256.6423 | -282.1181 | -2.7657 | -2.8044 |
72
+ | 0.6629 | 0.56 | 200 | 0.6977 | 0.1042 | 0.6860 | 0.0485 | 0.0335 | 0.5980 | 0.0149 | 0.0881 | -0.0489 | 0.0456 | -255.2263 | -279.7464 | -2.7492 | -2.7879 |
73
+ | 0.6479 | 0.85 | 300 | 0.7024 | 0.1592 | 0.6845 | 0.0528 | 0.0346 | 0.5940 | 0.0182 | 0.1072 | -0.0591 | 0.0555 | -255.1217 | -279.3168 | -2.7478 | -2.7859 |
74
+
75
+
76
+ ### Framework versions
77
+
78
+ - PEFT 0.7.1
79
+ - Transformers 4.39.0.dev0
80
+ - Pytorch 2.1.2+cu121
81
+ - Datasets 2.14.6
82
+ - Tokenizers 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1be6399a4864eefd60fa4b0521748573b89f4088c483a241ed6f439cc71c3481
3
  size 671150064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34e4ce3dec547099b6f9e5a3335ef132d4adf3cd5ebf4394c1549ebfa9198205
3
  size 671150064
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.6688590748209349,
4
+ "train_runtime": 4310.2257,
5
+ "train_samples": 5678,
6
+ "train_samples_per_second": 1.317,
7
+ "train_steps_per_second": 0.082
8
+ }
runs/Jul29_11-02-19_notebook-deployment-48-7d9b6c99-khd85/events.out.tfevents.1722251039.notebook-deployment-48-7d9b6c99-khd85.3446418.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8755f2bfdc6b4c4669c50eab378bb9930027cd41e968c399cadb72bd9ec12258
3
- size 39113
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:07c6ac1ab8e249cd2d511cfabed056ef1e54156e2a8475a8221445210ee799d9
3
+ size 44442
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.6688590748209349,
4
+ "train_runtime": 4310.2257,
5
+ "train_samples": 5678,
6
+ "train_samples_per_second": 1.317,
7
+ "train_steps_per_second": 0.082
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,813 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.0,
5
+ "eval_steps": 100,
6
+ "global_step": 355,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "dpo_losses": 0.6931471824645996,
13
+ "epoch": 0.0,
14
+ "grad_norm": 1.6020024881522184,
15
+ "learning_rate": 1.3888888888888887e-08,
16
+ "logits/chosen": -2.861618995666504,
17
+ "logits/rejected": -2.8205904960632324,
18
+ "logps/chosen": -271.06011962890625,
19
+ "logps/rejected": -211.1704559326172,
20
+ "loss": 0.6931,
21
+ "positive_losses": 0.0,
22
+ "rewards/accuracies": 0.0,
23
+ "rewards/chosen": 0.0,
24
+ "rewards/margins": 0.0,
25
+ "rewards/margins_max": 0.0,
26
+ "rewards/margins_min": 0.0,
27
+ "rewards/margins_std": 0.0,
28
+ "rewards/rejected": 0.0,
29
+ "step": 1
30
+ },
31
+ {
32
+ "dpo_losses": 0.6933339834213257,
33
+ "epoch": 0.03,
34
+ "grad_norm": 17.749396515379228,
35
+ "learning_rate": 1.3888888888888888e-07,
36
+ "logits/chosen": -2.8337996006011963,
37
+ "logits/rejected": -2.7912583351135254,
38
+ "logps/chosen": -325.06781005859375,
39
+ "logps/rejected": -274.9460754394531,
40
+ "loss": 0.6995,
41
+ "positive_losses": 0.07389768213033676,
42
+ "rewards/accuracies": 0.3888888955116272,
43
+ "rewards/chosen": -0.00017488877347204834,
44
+ "rewards/margins": -0.0003703224065247923,
45
+ "rewards/margins_max": 0.002196074463427067,
46
+ "rewards/margins_min": -0.003493131836876273,
47
+ "rewards/margins_std": 0.0025361739099025726,
48
+ "rewards/rejected": 0.00019543366215657443,
49
+ "step": 10
50
+ },
51
+ {
52
+ "dpo_losses": 0.6930112838745117,
53
+ "epoch": 0.06,
54
+ "grad_norm": 16.42860549070251,
55
+ "learning_rate": 2.7777777777777776e-07,
56
+ "logits/chosen": -2.7250771522521973,
57
+ "logits/rejected": -2.7066714763641357,
58
+ "logps/chosen": -293.7167663574219,
59
+ "logps/rejected": -215.6971893310547,
60
+ "loss": 0.6969,
61
+ "positive_losses": 0.0511661060154438,
62
+ "rewards/accuracies": 0.512499988079071,
63
+ "rewards/chosen": 0.0010916258906945586,
64
+ "rewards/margins": 0.00027377792866900563,
65
+ "rewards/margins_max": 0.0035606161691248417,
66
+ "rewards/margins_min": -0.0026067704893648624,
67
+ "rewards/margins_std": 0.002710341941565275,
68
+ "rewards/rejected": 0.0008178479038178921,
69
+ "step": 20
70
+ },
71
+ {
72
+ "dpo_losses": 0.692296028137207,
73
+ "epoch": 0.08,
74
+ "grad_norm": 8.629940663439823,
75
+ "learning_rate": 4.1666666666666667e-07,
76
+ "logits/chosen": -2.8196403980255127,
77
+ "logits/rejected": -2.750633716583252,
78
+ "logps/chosen": -303.18206787109375,
79
+ "logps/rejected": -232.16091918945312,
80
+ "loss": 0.6932,
81
+ "positive_losses": 0.0034084320068359375,
82
+ "rewards/accuracies": 0.7749999761581421,
83
+ "rewards/chosen": 0.005238121375441551,
84
+ "rewards/margins": 0.0017049547750502825,
85
+ "rewards/margins_max": 0.0041273366659879684,
86
+ "rewards/margins_min": -0.0010792845860123634,
87
+ "rewards/margins_std": 0.0023812309373170137,
88
+ "rewards/rejected": 0.003533167066052556,
89
+ "step": 30
90
+ },
91
+ {
92
+ "dpo_losses": 0.6920384764671326,
93
+ "epoch": 0.11,
94
+ "grad_norm": 1.7378063453273105,
95
+ "learning_rate": 4.998060489154965e-07,
96
+ "logits/chosen": -2.8426527976989746,
97
+ "logits/rejected": -2.7624073028564453,
98
+ "logps/chosen": -276.6007385253906,
99
+ "logps/rejected": -225.0730438232422,
100
+ "loss": 0.6922,
101
+ "positive_losses": 0.0007575989002361894,
102
+ "rewards/accuracies": 0.637499988079071,
103
+ "rewards/chosen": 0.008838965557515621,
104
+ "rewards/margins": 0.0022226297296583652,
105
+ "rewards/margins_max": 0.006442135665565729,
106
+ "rewards/margins_min": -0.0014907930744811893,
107
+ "rewards/margins_std": 0.0035486381966620684,
108
+ "rewards/rejected": 0.006616336293518543,
109
+ "step": 40
110
+ },
111
+ {
112
+ "dpo_losses": 0.6905555725097656,
113
+ "epoch": 0.14,
114
+ "grad_norm": 9.354247049667382,
115
+ "learning_rate": 4.976275538042932e-07,
116
+ "logits/chosen": -2.810286283493042,
117
+ "logits/rejected": -2.7382943630218506,
118
+ "logps/chosen": -274.2152404785156,
119
+ "logps/rejected": -233.04257202148438,
120
+ "loss": 0.6905,
121
+ "positive_losses": 0.0,
122
+ "rewards/accuracies": 0.8374999761581421,
123
+ "rewards/chosen": 0.013549095019698143,
124
+ "rewards/margins": 0.005201784893870354,
125
+ "rewards/margins_max": 0.0127005185931921,
126
+ "rewards/margins_min": -0.0008777756011113524,
127
+ "rewards/margins_std": 0.006243027281016111,
128
+ "rewards/rejected": 0.008347309194505215,
129
+ "step": 50
130
+ },
131
+ {
132
+ "dpo_losses": 0.6887227892875671,
133
+ "epoch": 0.17,
134
+ "grad_norm": 2.4332730738698984,
135
+ "learning_rate": 4.930493069997119e-07,
136
+ "logits/chosen": -2.749894142150879,
137
+ "logits/rejected": -2.7108871936798096,
138
+ "logps/chosen": -311.87884521484375,
139
+ "logps/rejected": -261.8995361328125,
140
+ "loss": 0.6896,
141
+ "positive_losses": 0.0,
142
+ "rewards/accuracies": 0.875,
143
+ "rewards/chosen": 0.017607735469937325,
144
+ "rewards/margins": 0.008888162672519684,
145
+ "rewards/margins_max": 0.017459740862250328,
146
+ "rewards/margins_min": 0.0007526646950282156,
147
+ "rewards/margins_std": 0.0074041737243533134,
148
+ "rewards/rejected": 0.008719570934772491,
149
+ "step": 60
150
+ },
151
+ {
152
+ "dpo_losses": 0.6875914931297302,
153
+ "epoch": 0.2,
154
+ "grad_norm": 1.7936785587345936,
155
+ "learning_rate": 4.861156761634013e-07,
156
+ "logits/chosen": -2.7838871479034424,
157
+ "logits/rejected": -2.7218618392944336,
158
+ "logps/chosen": -320.61077880859375,
159
+ "logps/rejected": -234.509033203125,
160
+ "loss": 0.6877,
161
+ "positive_losses": 0.0053993226028978825,
162
+ "rewards/accuracies": 0.862500011920929,
163
+ "rewards/chosen": 0.022504538297653198,
164
+ "rewards/margins": 0.01117872167378664,
165
+ "rewards/margins_max": 0.024569755420088768,
166
+ "rewards/margins_min": 0.00035735839628614485,
167
+ "rewards/margins_std": 0.0108140017837286,
168
+ "rewards/rejected": 0.011325814761221409,
169
+ "step": 70
170
+ },
171
+ {
172
+ "dpo_losses": 0.6856142282485962,
173
+ "epoch": 0.23,
174
+ "grad_norm": 1.8616136810198376,
175
+ "learning_rate": 4.768938549177392e-07,
176
+ "logits/chosen": -2.8332719802856445,
177
+ "logits/rejected": -2.7764153480529785,
178
+ "logps/chosen": -318.5867614746094,
179
+ "logps/rejected": -285.2149963378906,
180
+ "loss": 0.6869,
181
+ "positive_losses": 0.006173324771225452,
182
+ "rewards/accuracies": 0.800000011920929,
183
+ "rewards/chosen": 0.026837289333343506,
184
+ "rewards/margins": 0.015200009569525719,
185
+ "rewards/margins_max": 0.03601212427020073,
186
+ "rewards/margins_min": -0.00037241075187921524,
187
+ "rewards/margins_std": 0.01664547622203827,
188
+ "rewards/rejected": 0.011637277901172638,
189
+ "step": 80
190
+ },
191
+ {
192
+ "dpo_losses": 0.6834368705749512,
193
+ "epoch": 0.25,
194
+ "grad_norm": 2.576979966636063,
195
+ "learning_rate": 4.654732116743193e-07,
196
+ "logits/chosen": -2.7666513919830322,
197
+ "logits/rejected": -2.724118709564209,
198
+ "logps/chosen": -272.710693359375,
199
+ "logps/rejected": -197.66629028320312,
200
+ "loss": 0.6833,
201
+ "positive_losses": 0.0,
202
+ "rewards/accuracies": 0.862500011920929,
203
+ "rewards/chosen": 0.03284075856208801,
204
+ "rewards/margins": 0.01963154971599579,
205
+ "rewards/margins_max": 0.04466867819428444,
206
+ "rewards/margins_min": 0.0018520345911383629,
207
+ "rewards/margins_std": 0.020031433552503586,
208
+ "rewards/rejected": 0.013209208846092224,
209
+ "step": 90
210
+ },
211
+ {
212
+ "dpo_losses": 0.6846113801002502,
213
+ "epoch": 0.28,
214
+ "grad_norm": 1.9012962136053702,
215
+ "learning_rate": 4.519644235671752e-07,
216
+ "logits/chosen": -2.8058769702911377,
217
+ "logits/rejected": -2.7677180767059326,
218
+ "logps/chosen": -278.3069763183594,
219
+ "logps/rejected": -260.80572509765625,
220
+ "loss": 0.6834,
221
+ "positive_losses": 0.00875930767506361,
222
+ "rewards/accuracies": 0.8125,
223
+ "rewards/chosen": 0.031047578901052475,
224
+ "rewards/margins": 0.01723078265786171,
225
+ "rewards/margins_max": 0.03883099928498268,
226
+ "rewards/margins_min": 0.00033125150366686285,
227
+ "rewards/margins_std": 0.017453256994485855,
228
+ "rewards/rejected": 0.013816798105835915,
229
+ "step": 100
230
+ },
231
+ {
232
+ "epoch": 0.28,
233
+ "eval_dpo_losses": 0.6905006766319275,
234
+ "eval_logits/chosen": -2.804429292678833,
235
+ "eval_logits/rejected": -2.7657294273376465,
236
+ "eval_logps/chosen": -282.1181335449219,
237
+ "eval_logps/rejected": -256.64227294921875,
238
+ "eval_loss": 0.6931570768356323,
239
+ "eval_positive_losses": 0.02003120444715023,
240
+ "eval_rewards/accuracies": 0.5929999947547913,
241
+ "eval_rewards/chosen": 0.024752924218773842,
242
+ "eval_rewards/margins": 0.005386360455304384,
243
+ "eval_rewards/margins_max": 0.03411807864904404,
244
+ "eval_rewards/margins_min": -0.019733905792236328,
245
+ "eval_rewards/margins_std": 0.017771249637007713,
246
+ "eval_rewards/rejected": 0.019366566091775894,
247
+ "eval_runtime": 428.432,
248
+ "eval_samples_per_second": 4.668,
249
+ "eval_steps_per_second": 0.292,
250
+ "step": 100
251
+ },
252
+ {
253
+ "dpo_losses": 0.676801860332489,
254
+ "epoch": 0.31,
255
+ "grad_norm": 2.3464200895802714,
256
+ "learning_rate": 4.364984038837727e-07,
257
+ "logits/chosen": -2.87202787399292,
258
+ "logits/rejected": -2.7757174968719482,
259
+ "logps/chosen": -369.87457275390625,
260
+ "logps/rejected": -285.02069091796875,
261
+ "loss": 0.6771,
262
+ "positive_losses": 0.0,
263
+ "rewards/accuracies": 0.9375,
264
+ "rewards/chosen": 0.0473114475607872,
265
+ "rewards/margins": 0.03318057209253311,
266
+ "rewards/margins_max": 0.06624362617731094,
267
+ "rewards/margins_min": 0.005891180131584406,
268
+ "rewards/margins_std": 0.026871800422668457,
269
+ "rewards/rejected": 0.014130881056189537,
270
+ "step": 110
271
+ },
272
+ {
273
+ "dpo_losses": 0.6757510900497437,
274
+ "epoch": 0.34,
275
+ "grad_norm": 2.0654592381160293,
276
+ "learning_rate": 4.1922503338800447e-07,
277
+ "logits/chosen": -2.8409907817840576,
278
+ "logits/rejected": -2.7864270210266113,
279
+ "logps/chosen": -343.3132629394531,
280
+ "logps/rejected": -264.656982421875,
281
+ "loss": 0.6761,
282
+ "positive_losses": 0.00038242340087890625,
283
+ "rewards/accuracies": 0.8999999761581421,
284
+ "rewards/chosen": 0.05072961002588272,
285
+ "rewards/margins": 0.035357214510440826,
286
+ "rewards/margins_max": 0.0718097984790802,
287
+ "rewards/margins_min": 0.007976613938808441,
288
+ "rewards/margins_std": 0.029289904981851578,
289
+ "rewards/rejected": 0.015372401103377342,
290
+ "step": 120
291
+ },
292
+ {
293
+ "dpo_losses": 0.672697901725769,
294
+ "epoch": 0.37,
295
+ "grad_norm": 1.805644520581111,
296
+ "learning_rate": 4.003117078299021e-07,
297
+ "logits/chosen": -2.833576202392578,
298
+ "logits/rejected": -2.7511653900146484,
299
+ "logps/chosen": -376.5670471191406,
300
+ "logps/rejected": -298.7974548339844,
301
+ "loss": 0.6728,
302
+ "positive_losses": 0.007812881842255592,
303
+ "rewards/accuracies": 0.9125000238418579,
304
+ "rewards/chosen": 0.05797192454338074,
305
+ "rewards/margins": 0.041637517511844635,
306
+ "rewards/margins_max": 0.07995648682117462,
307
+ "rewards/margins_min": 0.005986917298287153,
308
+ "rewards/margins_std": 0.03285466879606247,
309
+ "rewards/rejected": 0.016334403306245804,
310
+ "step": 130
311
+ },
312
+ {
313
+ "dpo_losses": 0.6723438501358032,
314
+ "epoch": 0.39,
315
+ "grad_norm": 1.4837668528318801,
316
+ "learning_rate": 3.799417157181075e-07,
317
+ "logits/chosen": -2.7876131534576416,
318
+ "logits/rejected": -2.748243808746338,
319
+ "logps/chosen": -312.4587707519531,
320
+ "logps/rejected": -268.16253662109375,
321
+ "loss": 0.6749,
322
+ "positive_losses": 0.0,
323
+ "rewards/accuracies": 0.8999999761581421,
324
+ "rewards/chosen": 0.059552956372499466,
325
+ "rewards/margins": 0.04268346354365349,
326
+ "rewards/margins_max": 0.09346815198659897,
327
+ "rewards/margins_min": 0.005906702019274235,
328
+ "rewards/margins_std": 0.04082999378442764,
329
+ "rewards/rejected": 0.01686949096620083,
330
+ "step": 140
331
+ },
332
+ {
333
+ "dpo_losses": 0.6723035573959351,
334
+ "epoch": 0.42,
335
+ "grad_norm": 1.9734696023947083,
336
+ "learning_rate": 3.583124620760659e-07,
337
+ "logits/chosen": -2.7900147438049316,
338
+ "logits/rejected": -2.7500596046447754,
339
+ "logps/chosen": -282.7308654785156,
340
+ "logps/rejected": -211.3845672607422,
341
+ "loss": 0.6711,
342
+ "positive_losses": 0.0,
343
+ "rewards/accuracies": 0.925000011920929,
344
+ "rewards/chosen": 0.05954126641154289,
345
+ "rewards/margins": 0.04255921393632889,
346
+ "rewards/margins_max": 0.08835546672344208,
347
+ "rewards/margins_min": 0.005340501666069031,
348
+ "rewards/margins_std": 0.037935130298137665,
349
+ "rewards/rejected": 0.016982052475214005,
350
+ "step": 150
351
+ },
352
+ {
353
+ "dpo_losses": 0.669468343257904,
354
+ "epoch": 0.45,
355
+ "grad_norm": 1.6641482478870189,
356
+ "learning_rate": 3.356335553954679e-07,
357
+ "logits/chosen": -2.768503189086914,
358
+ "logits/rejected": -2.7139506340026855,
359
+ "logps/chosen": -293.73223876953125,
360
+ "logps/rejected": -232.4127655029297,
361
+ "loss": 0.6692,
362
+ "positive_losses": 0.010938262566924095,
363
+ "rewards/accuracies": 0.925000011920929,
364
+ "rewards/chosen": 0.06814642250537872,
365
+ "rewards/margins": 0.04834110662341118,
366
+ "rewards/margins_max": 0.09320978820323944,
367
+ "rewards/margins_min": 0.00712673831731081,
368
+ "rewards/margins_std": 0.038901425898075104,
369
+ "rewards/rejected": 0.019805317744612694,
370
+ "step": 160
371
+ },
372
+ {
373
+ "dpo_losses": 0.6628905534744263,
374
+ "epoch": 0.48,
375
+ "grad_norm": 1.8761085412198597,
376
+ "learning_rate": 3.121247763262235e-07,
377
+ "logits/chosen": -2.8206450939178467,
378
+ "logits/rejected": -2.762035846710205,
379
+ "logps/chosen": -321.90386962890625,
380
+ "logps/rejected": -294.7594299316406,
381
+ "loss": 0.6649,
382
+ "positive_losses": 0.0,
383
+ "rewards/accuracies": 0.887499988079071,
384
+ "rewards/chosen": 0.07717735320329666,
385
+ "rewards/margins": 0.062114737927913666,
386
+ "rewards/margins_max": 0.11694135516881943,
387
+ "rewards/margins_min": 0.011941976845264435,
388
+ "rewards/margins_std": 0.0472017265856266,
389
+ "rewards/rejected": 0.01506261806935072,
390
+ "step": 170
391
+ },
392
+ {
393
+ "dpo_losses": 0.6624481678009033,
394
+ "epoch": 0.51,
395
+ "grad_norm": 1.9655992636640798,
396
+ "learning_rate": 2.880139477883347e-07,
397
+ "logits/chosen": -2.8146770000457764,
398
+ "logits/rejected": -2.726132869720459,
399
+ "logps/chosen": -323.855712890625,
400
+ "logps/rejected": -292.4509582519531,
401
+ "loss": 0.6619,
402
+ "positive_losses": 0.0,
403
+ "rewards/accuracies": 0.9375,
404
+ "rewards/chosen": 0.08045096695423126,
405
+ "rewards/margins": 0.06311116367578506,
406
+ "rewards/margins_max": 0.1282820701599121,
407
+ "rewards/margins_min": 0.017489684745669365,
408
+ "rewards/margins_std": 0.049531489610672,
409
+ "rewards/rejected": 0.0173397995531559,
410
+ "step": 180
411
+ },
412
+ {
413
+ "dpo_losses": 0.6590775847434998,
414
+ "epoch": 0.54,
415
+ "grad_norm": 2.3566449266948037,
416
+ "learning_rate": 2.635347271463544e-07,
417
+ "logits/chosen": -2.776195526123047,
418
+ "logits/rejected": -2.683864116668701,
419
+ "logps/chosen": -313.2172546386719,
420
+ "logps/rejected": -235.8129119873047,
421
+ "loss": 0.6608,
422
+ "positive_losses": 0.006581115536391735,
423
+ "rewards/accuracies": 0.9125000238418579,
424
+ "rewards/chosen": 0.0874362364411354,
425
+ "rewards/margins": 0.07068866491317749,
426
+ "rewards/margins_max": 0.1461794078350067,
427
+ "rewards/margins_min": 0.012810202315449715,
428
+ "rewards/margins_std": 0.06181079149246216,
429
+ "rewards/rejected": 0.01674756594002247,
430
+ "step": 190
431
+ },
432
+ {
433
+ "dpo_losses": 0.659989058971405,
434
+ "epoch": 0.56,
435
+ "grad_norm": 8.082919972131291,
436
+ "learning_rate": 2.3892434184240534e-07,
437
+ "logits/chosen": -2.851855516433716,
438
+ "logits/rejected": -2.7749085426330566,
439
+ "logps/chosen": -330.7215881347656,
440
+ "logps/rejected": -263.6206359863281,
441
+ "loss": 0.6629,
442
+ "positive_losses": 0.010635947808623314,
443
+ "rewards/accuracies": 0.9125000238418579,
444
+ "rewards/chosen": 0.08811243623495102,
445
+ "rewards/margins": 0.06834487617015839,
446
+ "rewards/margins_max": 0.13839691877365112,
447
+ "rewards/margins_min": 0.01415681280195713,
448
+ "rewards/margins_std": 0.05431779474020004,
449
+ "rewards/rejected": 0.01976756379008293,
450
+ "step": 200
451
+ },
452
+ {
453
+ "epoch": 0.56,
454
+ "eval_dpo_losses": 0.6859865784645081,
455
+ "eval_logits/chosen": -2.787888765335083,
456
+ "eval_logits/rejected": -2.7491748332977295,
457
+ "eval_logps/chosen": -279.74639892578125,
458
+ "eval_logps/rejected": -255.2262725830078,
459
+ "eval_loss": 0.6976820230484009,
460
+ "eval_positive_losses": 0.10420288890600204,
461
+ "eval_rewards/accuracies": 0.5979999899864197,
462
+ "eval_rewards/chosen": 0.04846998676657677,
463
+ "eval_rewards/margins": 0.01494329422712326,
464
+ "eval_rewards/margins_max": 0.08812851458787918,
465
+ "eval_rewards/margins_min": -0.04890156909823418,
466
+ "eval_rewards/margins_std": 0.04564943537116051,
467
+ "eval_rewards/rejected": 0.033526696264743805,
468
+ "eval_runtime": 427.8302,
469
+ "eval_samples_per_second": 4.675,
470
+ "eval_steps_per_second": 0.292,
471
+ "step": 200
472
+ },
473
+ {
474
+ "dpo_losses": 0.6545363068580627,
475
+ "epoch": 0.59,
476
+ "grad_norm": 1.9636450688854714,
477
+ "learning_rate": 2.1442129043167873e-07,
478
+ "logits/chosen": -2.754328966140747,
479
+ "logits/rejected": -2.7271111011505127,
480
+ "logps/chosen": -310.51593017578125,
481
+ "logps/rejected": -254.80386352539062,
482
+ "loss": 0.6579,
483
+ "positive_losses": 0.019413376227021217,
484
+ "rewards/accuracies": 0.9125000238418579,
485
+ "rewards/chosen": 0.096683569252491,
486
+ "rewards/margins": 0.08047457039356232,
487
+ "rewards/margins_max": 0.16606834530830383,
488
+ "rewards/margins_min": 0.009150387719273567,
489
+ "rewards/margins_std": 0.07150334119796753,
490
+ "rewards/rejected": 0.01620900072157383,
491
+ "step": 210
492
+ },
493
+ {
494
+ "dpo_losses": 0.6517983675003052,
495
+ "epoch": 0.62,
496
+ "grad_norm": 1.8542376801111211,
497
+ "learning_rate": 1.9026303129961048e-07,
498
+ "logits/chosen": -2.85148024559021,
499
+ "logits/rejected": -2.75597882270813,
500
+ "logps/chosen": -343.08294677734375,
501
+ "logps/rejected": -272.4956359863281,
502
+ "loss": 0.6555,
503
+ "positive_losses": 0.0,
504
+ "rewards/accuracies": 0.949999988079071,
505
+ "rewards/chosen": 0.1037331372499466,
506
+ "rewards/margins": 0.08541466295719147,
507
+ "rewards/margins_max": 0.1512610912322998,
508
+ "rewards/margins_min": 0.02069752663373947,
509
+ "rewards/margins_std": 0.05890106409788132,
510
+ "rewards/rejected": 0.018318474292755127,
511
+ "step": 220
512
+ },
513
+ {
514
+ "dpo_losses": 0.6518235206604004,
515
+ "epoch": 0.65,
516
+ "grad_norm": 1.9608270986414384,
517
+ "learning_rate": 1.6668368145931396e-07,
518
+ "logits/chosen": -2.8535895347595215,
519
+ "logits/rejected": -2.7599148750305176,
520
+ "logps/chosen": -361.3349609375,
521
+ "logps/rejected": -260.3254089355469,
522
+ "loss": 0.6498,
523
+ "positive_losses": 0.010654067620635033,
524
+ "rewards/accuracies": 0.925000011920929,
525
+ "rewards/chosen": 0.10340724140405655,
526
+ "rewards/margins": 0.08572699129581451,
527
+ "rewards/margins_max": 0.16569408774375916,
528
+ "rewards/margins_min": 0.01854148879647255,
529
+ "rewards/margins_std": 0.06613387167453766,
530
+ "rewards/rejected": 0.017680250108242035,
531
+ "step": 230
532
+ },
533
+ {
534
+ "dpo_losses": 0.6590827703475952,
535
+ "epoch": 0.68,
536
+ "grad_norm": 1.5969745701990838,
537
+ "learning_rate": 1.4391174773015834e-07,
538
+ "logits/chosen": -2.8000075817108154,
539
+ "logits/rejected": -2.729769229888916,
540
+ "logps/chosen": -316.0329284667969,
541
+ "logps/rejected": -281.4412841796875,
542
+ "loss": 0.6592,
543
+ "positive_losses": 0.009823227301239967,
544
+ "rewards/accuracies": 0.8999999761581421,
545
+ "rewards/chosen": 0.0876411646604538,
546
+ "rewards/margins": 0.07030560076236725,
547
+ "rewards/margins_max": 0.14278724789619446,
548
+ "rewards/margins_min": 0.013438734225928783,
549
+ "rewards/margins_std": 0.05923761799931526,
550
+ "rewards/rejected": 0.017335567623376846,
551
+ "step": 240
552
+ },
553
+ {
554
+ "dpo_losses": 0.6565552949905396,
555
+ "epoch": 0.7,
556
+ "grad_norm": 1.9164626662561326,
557
+ "learning_rate": 1.2216791228457775e-07,
558
+ "logits/chosen": -2.8003289699554443,
559
+ "logits/rejected": -2.7232441902160645,
560
+ "logps/chosen": -302.2171325683594,
561
+ "logps/rejected": -249.5670928955078,
562
+ "loss": 0.661,
563
+ "positive_losses": 0.009924506768584251,
564
+ "rewards/accuracies": 0.949999988079071,
565
+ "rewards/chosen": 0.090640127658844,
566
+ "rewards/margins": 0.07586830854415894,
567
+ "rewards/margins_max": 0.15460054576396942,
568
+ "rewards/margins_min": 0.012039058841764927,
569
+ "rewards/margins_std": 0.06554323434829712,
570
+ "rewards/rejected": 0.014771823771297932,
571
+ "step": 250
572
+ },
573
+ {
574
+ "dpo_losses": 0.6541475057601929,
575
+ "epoch": 0.73,
576
+ "grad_norm": 1.7167350749240156,
577
+ "learning_rate": 1.0166289402331391e-07,
578
+ "logits/chosen": -2.8486251831054688,
579
+ "logits/rejected": -2.7656264305114746,
580
+ "logps/chosen": -282.35260009765625,
581
+ "logps/rejected": -257.30596923828125,
582
+ "loss": 0.6505,
583
+ "positive_losses": 0.0,
584
+ "rewards/accuracies": 0.949999988079071,
585
+ "rewards/chosen": 0.09740927070379257,
586
+ "rewards/margins": 0.08115915954113007,
587
+ "rewards/margins_max": 0.16602402925491333,
588
+ "rewards/margins_min": 0.016714217141270638,
589
+ "rewards/margins_std": 0.06957190483808517,
590
+ "rewards/rejected": 0.016250116750597954,
591
+ "step": 260
592
+ },
593
+ {
594
+ "dpo_losses": 0.6572530269622803,
595
+ "epoch": 0.76,
596
+ "grad_norm": 2.017681285366355,
597
+ "learning_rate": 8.259540650444734e-08,
598
+ "logits/chosen": -2.7961807250976562,
599
+ "logits/rejected": -2.743107318878174,
600
+ "logps/chosen": -299.6520690917969,
601
+ "logps/rejected": -260.46929931640625,
602
+ "loss": 0.6604,
603
+ "positive_losses": 0.046803951263427734,
604
+ "rewards/accuracies": 0.9125000238418579,
605
+ "rewards/chosen": 0.09463498741388321,
606
+ "rewards/margins": 0.07433197647333145,
607
+ "rewards/margins_max": 0.14396773278713226,
608
+ "rewards/margins_min": 0.020581519231200218,
609
+ "rewards/margins_std": 0.05590103194117546,
610
+ "rewards/rejected": 0.020303016528487206,
611
+ "step": 270
612
+ },
613
+ {
614
+ "dpo_losses": 0.655379593372345,
615
+ "epoch": 0.79,
616
+ "grad_norm": 1.734695130092448,
617
+ "learning_rate": 6.515023221586721e-08,
618
+ "logits/chosen": -2.7713265419006348,
619
+ "logits/rejected": -2.7317543029785156,
620
+ "logps/chosen": -293.76214599609375,
621
+ "logps/rejected": -270.7189025878906,
622
+ "loss": 0.6575,
623
+ "positive_losses": 0.04788818210363388,
624
+ "rewards/accuracies": 0.949999988079071,
625
+ "rewards/chosen": 0.09694816917181015,
626
+ "rewards/margins": 0.07816567271947861,
627
+ "rewards/margins_max": 0.14518895745277405,
628
+ "rewards/margins_min": 0.019240325316786766,
629
+ "rewards/margins_std": 0.057953737676143646,
630
+ "rewards/rejected": 0.018782509490847588,
631
+ "step": 280
632
+ },
633
+ {
634
+ "dpo_losses": 0.6595104336738586,
635
+ "epoch": 0.82,
636
+ "grad_norm": 1.7562807996771055,
637
+ "learning_rate": 4.949643185335287e-08,
638
+ "logits/chosen": -2.7762978076934814,
639
+ "logits/rejected": -2.7266323566436768,
640
+ "logps/chosen": -278.39996337890625,
641
+ "logps/rejected": -262.4858093261719,
642
+ "loss": 0.6646,
643
+ "positive_losses": 0.06580867618322372,
644
+ "rewards/accuracies": 0.875,
645
+ "rewards/chosen": 0.08516252040863037,
646
+ "rewards/margins": 0.06983821839094162,
647
+ "rewards/margins_max": 0.15399742126464844,
648
+ "rewards/margins_min": 0.005857336334884167,
649
+ "rewards/margins_std": 0.06812156736850739,
650
+ "rewards/rejected": 0.015324289910495281,
651
+ "step": 290
652
+ },
653
+ {
654
+ "dpo_losses": 0.6477023363113403,
655
+ "epoch": 0.85,
656
+ "grad_norm": 1.9324048018602529,
657
+ "learning_rate": 3.578570595810274e-08,
658
+ "logits/chosen": -2.8429884910583496,
659
+ "logits/rejected": -2.765357732772827,
660
+ "logps/chosen": -335.2912292480469,
661
+ "logps/rejected": -287.69110107421875,
662
+ "loss": 0.6479,
663
+ "positive_losses": 0.0,
664
+ "rewards/accuracies": 0.9125000238418579,
665
+ "rewards/chosen": 0.1122179627418518,
666
+ "rewards/margins": 0.09479531645774841,
667
+ "rewards/margins_max": 0.18769413232803345,
668
+ "rewards/margins_min": 0.015501707792282104,
669
+ "rewards/margins_std": 0.07714836299419403,
670
+ "rewards/rejected": 0.017422644421458244,
671
+ "step": 300
672
+ },
673
+ {
674
+ "epoch": 0.85,
675
+ "eval_dpo_losses": 0.6845089793205261,
676
+ "eval_logits/chosen": -2.785942554473877,
677
+ "eval_logits/rejected": -2.7478015422821045,
678
+ "eval_logps/chosen": -279.3167724609375,
679
+ "eval_logps/rejected": -255.1217498779297,
680
+ "eval_loss": 0.702374279499054,
681
+ "eval_positive_losses": 0.15923255681991577,
682
+ "eval_rewards/accuracies": 0.593999981880188,
683
+ "eval_rewards/chosen": 0.05276622623205185,
684
+ "eval_rewards/margins": 0.01819423958659172,
685
+ "eval_rewards/margins_max": 0.10716991871595383,
686
+ "eval_rewards/margins_min": -0.059129390865564346,
687
+ "eval_rewards/margins_std": 0.055548474192619324,
688
+ "eval_rewards/rejected": 0.03457198664546013,
689
+ "eval_runtime": 428.1446,
690
+ "eval_samples_per_second": 4.671,
691
+ "eval_steps_per_second": 0.292,
692
+ "step": 300
693
+ },
694
+ {
695
+ "dpo_losses": 0.6494620442390442,
696
+ "epoch": 0.87,
697
+ "grad_norm": 1.7775417343369382,
698
+ "learning_rate": 2.415092479103503e-08,
699
+ "logits/chosen": -2.8399946689605713,
700
+ "logits/rejected": -2.7444241046905518,
701
+ "logps/chosen": -287.1294250488281,
702
+ "logps/rejected": -212.641845703125,
703
+ "loss": 0.6483,
704
+ "positive_losses": 0.0,
705
+ "rewards/accuracies": 0.987500011920929,
706
+ "rewards/chosen": 0.10686449706554413,
707
+ "rewards/margins": 0.0910436287522316,
708
+ "rewards/margins_max": 0.18475615978240967,
709
+ "rewards/margins_min": 0.017825862392783165,
710
+ "rewards/margins_std": 0.07425465434789658,
711
+ "rewards/rejected": 0.015820873901247978,
712
+ "step": 310
713
+ },
714
+ {
715
+ "dpo_losses": 0.654121994972229,
716
+ "epoch": 0.9,
717
+ "grad_norm": 1.8615740426477174,
718
+ "learning_rate": 1.4704840690808656e-08,
719
+ "logits/chosen": -2.8066587448120117,
720
+ "logits/rejected": -2.7482457160949707,
721
+ "logps/chosen": -298.34295654296875,
722
+ "logps/rejected": -258.84478759765625,
723
+ "loss": 0.6527,
724
+ "positive_losses": 0.004542541690170765,
725
+ "rewards/accuracies": 0.9125000238418579,
726
+ "rewards/chosen": 0.09930218756198883,
727
+ "rewards/margins": 0.0810491144657135,
728
+ "rewards/margins_max": 0.17170652747154236,
729
+ "rewards/margins_min": 0.012719864025712013,
730
+ "rewards/margins_std": 0.07160074263811111,
731
+ "rewards/rejected": 0.018253061920404434,
732
+ "step": 320
733
+ },
734
+ {
735
+ "dpo_losses": 0.6466237306594849,
736
+ "epoch": 0.93,
737
+ "grad_norm": 1.9427490502829974,
738
+ "learning_rate": 7.538995394063995e-09,
739
+ "logits/chosen": -2.8800158500671387,
740
+ "logits/rejected": -2.7991366386413574,
741
+ "logps/chosen": -343.4425964355469,
742
+ "logps/rejected": -265.7518005371094,
743
+ "loss": 0.6494,
744
+ "positive_losses": 0.0037406920455396175,
745
+ "rewards/accuracies": 0.9750000238418579,
746
+ "rewards/chosen": 0.10877249389886856,
747
+ "rewards/margins": 0.09689434617757797,
748
+ "rewards/margins_max": 0.17881345748901367,
749
+ "rewards/margins_min": 0.022731659933924675,
750
+ "rewards/margins_std": 0.07121957838535309,
751
+ "rewards/rejected": 0.011878135614097118,
752
+ "step": 330
753
+ },
754
+ {
755
+ "dpo_losses": 0.6502379179000854,
756
+ "epoch": 0.96,
757
+ "grad_norm": 1.916115854424015,
758
+ "learning_rate": 2.7228329070159705e-09,
759
+ "logits/chosen": -2.767482280731201,
760
+ "logits/rejected": -2.7154014110565186,
761
+ "logps/chosen": -289.31561279296875,
762
+ "logps/rejected": -249.5873260498047,
763
+ "loss": 0.6484,
764
+ "positive_losses": 0.02227325364947319,
765
+ "rewards/accuracies": 0.925000011920929,
766
+ "rewards/chosen": 0.10529886186122894,
767
+ "rewards/margins": 0.08947577327489853,
768
+ "rewards/margins_max": 0.1924385130405426,
769
+ "rewards/margins_min": 0.021350277587771416,
770
+ "rewards/margins_std": 0.07764019072055817,
771
+ "rewards/rejected": 0.01582309789955616,
772
+ "step": 340
773
+ },
774
+ {
775
+ "dpo_losses": 0.6559463143348694,
776
+ "epoch": 0.99,
777
+ "grad_norm": 5.156600662792089,
778
+ "learning_rate": 3.0302652553296226e-10,
779
+ "logits/chosen": -2.750614881515503,
780
+ "logits/rejected": -2.7069051265716553,
781
+ "logps/chosen": -305.8713073730469,
782
+ "logps/rejected": -285.0591735839844,
783
+ "loss": 0.6565,
784
+ "positive_losses": 0.04594574123620987,
785
+ "rewards/accuracies": 0.9375,
786
+ "rewards/chosen": 0.09087814390659332,
787
+ "rewards/margins": 0.07695063203573227,
788
+ "rewards/margins_max": 0.1514495462179184,
789
+ "rewards/margins_min": 0.016952747479081154,
790
+ "rewards/margins_std": 0.06040460616350174,
791
+ "rewards/rejected": 0.01392750721424818,
792
+ "step": 350
793
+ },
794
+ {
795
+ "epoch": 1.0,
796
+ "step": 355,
797
+ "total_flos": 0.0,
798
+ "train_loss": 0.6688590748209349,
799
+ "train_runtime": 4310.2257,
800
+ "train_samples_per_second": 1.317,
801
+ "train_steps_per_second": 0.082
802
+ }
803
+ ],
804
+ "logging_steps": 10,
805
+ "max_steps": 355,
806
+ "num_input_tokens_seen": 0,
807
+ "num_train_epochs": 1,
808
+ "save_steps": 100,
809
+ "total_flos": 0.0,
810
+ "train_batch_size": 4,
811
+ "trial_name": null,
812
+ "trial_params": null
813
+ }